Activating project at `~/Documents/github.com/ucla-biostat-216/2023fall/slides/07-matinv`
Status `~/Documents/github.com/ucla-biostat-216/2023fall/slides/07-matinv/Project.toml`
[37e2e46d] LinearAlgebra
usingLinearAlgebra
1 Generalized inverse and Moore-Penrose inverse
Let \(\mathbf{A} \in \mathbb{R}^{m \times n}\). A matrix \(\mathbf{G} \in \mathbb{R}^{n \times m}\) is called the Moore-Penrose inverse or MP inverse of \(\mathbf{A}\) if it satisifes following four conditions:
\((\mathbf{G} \mathbf{A})' = \mathbf{G} \mathbf{A}\).
We shall denote the Moore-Penrose inverse of \(\mathbf{A}\) by \(\mathbf{A}^+\).
Any matrix \(\mathbf{G} \in \mathbb{R}^{n \times m}\) that satisfies (1) is called a generalized inverse, or \(g_1\) inverse, or inner pseudo-inverse. Denoted by \(\mathbf{A}^-\) or \(\mathbf{A}^g\).
Generalized inverse is not unique in general.
Definition (1) for generalized inverses is motivated by the following result.
If \(\mathbf{A} \mathbf{x} = \mathbf{b}\) has solution(s), then \(\mathbf{A}^- \mathbf{b}\) is a solution for any generalized inverse \(\mathbf{A}^-\).
Any matrix \(\mathbf{G} \in \mathbb{R}^{n \times m}\) that satisfies (1)+(2) is called a reflective generalized inverse, or \(g_2\) inverse, or outer pseudo-inverse. Denoted by \(\mathbf{A}^*\).
The Moore-Penrose inverse of any matrix \(\mathbf{A}\) exists and is unique.
Proof of 1: We already know \(\mathcal{C}(\mathbf{A}) \supseteq \mathcal{C}(\mathbf{A} \mathbf{A}^-)\). Now since \(\mathbf{A} = \mathbf{A} \mathbf{A}^- \mathbf{A}\), we also have \(\mathcal{C}(\mathbf{A}) \subseteq \mathcal{C}(\mathbf{A} \mathbf{A}^-)\).
Proof of 2: Since \(\mathbf{A} = \mathbf{A} \mathbf{A}^- \mathbf{A}\), \(\text{rank}(\mathbf{A}) \le \text{rank}(\mathbf{A} \mathbf{A}^-) \le \text{rank}(\mathbf{A})\).
Generalized inverse has equal or larger rank.\(\text{rank}(\mathbf{A}^-) \ge \text{rank}(\mathbf{A})\).
If \(\mathbf{A}\) has full column rank, then the MP inverse is \(\mathbf{A}^+ = (\mathbf{A}'\mathbf{A})^{-1} \mathbf{A}'\). Given QR factorization \(\mathbf{A} = \mathbf{Q} \mathbf{R}\), \(\mathbf{A}^+ = \mathbf{R}^{-1} \mathbf{Q}'\).
If \(\mathbf{A}\) has full row rank, then the MP inverse is \(\mathbf{A}^+ = \mathbf{A}'(\mathbf{A} \mathbf{A}')^{-1}\). Given QR factorization \(\mathbf{A}' = \mathbf{Q} \mathbf{R}\), \(\mathbf{A}^+ = \mathbf{Q} \mathbf{R}'^{-1}\).
Proof: Equivalence between 1, 2 and 3 is trivial. 4 implies 1: apparently \(\tilde{\mathbf{x}} = \mathbf{A}^- \mathbf{b}\) is a solution to \(\mathbf{A} \mathbf{x} = \mathbf{b}\). 1 implies 4: if \(\tilde{\mathbf{x}}\) is a solution, then \(\mathbf{b} = \mathbf{A} \tilde{\mathbf{x}} = \mathbf{A} \mathbf{A}^- \mathbf{A} \tilde{\mathbf{x}} = \mathbf{A} \mathbf{A}^- \mathbf{b}\).
The last equivalence gives some intuition why \(\mathbf{A}^-\) is called an inverse.
2.2 How to characterize all solutions to \(\mathbf{A} \mathbf{x} = \mathbf{b}\)?
Let’s first study the homogenous case \(\mathbf{A} \mathbf{x} = \mathbf{0}\), which is always consistent (why?).
Theorem: \(\tilde{\mathbf{x}}\) is a solution to \(\mathbf{A} \mathbf{x} = \mathbf{0}\) if and only if \(\tilde{\mathbf{x}} = (\mathbf{I}_n - \mathbf{A}^- \mathbf{A}) \mathbf{q}\) for some \(\mathbf{q} \in \mathbb{R}^n\).
Proof: If: Apparently \((\mathbf{I}_n - \mathbf{A}^- \mathbf{A}) \mathbf{q}\) is a solution regardless value of \(\mathbf{q}\) since \(\mathbf{A} (\mathbf{I}_n - \mathbf{A}^- \mathbf{A}) = \mathbf{A} - \mathbf{A} = \mathbf{0}\). Only if: If \(\tilde{\mathbf{x}}\) is a solution, then \(\tilde{\mathbf{x}} = (\mathbf{I}_n - \mathbf{A}^- \mathbf{A}) \mathbf{q}\) by taking \(\mathbf{q} = \tilde{\mathbf{x}}\).
Rephrasing above result, we have \(\mathcal{N}(\mathbf{A}) = \mathcal{C}(\mathbf{I}_n - \mathbf{A}^- \mathbf{A})\).
Remarks (optional):
The matrix \(\mathbf{I}_n - \mathbf{A}^- \mathbf{A}\) is idempotent and is an oblique projection onto \(\mathcal{N}(\mathbf{A})\).
The matrix \(\mathbf{I}_n - \mathbf{A}^+ \mathbf{A}\) is symmetric and idempotent and is an orthogonal projection onto \(\mathcal{N}(\mathbf{A})\).
The matrix \(\mathbf{A} \mathbf{A}^-\) is idempotent and is an oblique projection onto \(\mathcal{C}(\mathbf{A})\).
The matrix \(\mathbf{A} \mathbf{A}^+\) is symmetric and idempotent and is an orthogonal projection onto \(\mathcal{C}(\mathbf{A})\).
Now for the inhomogenous case \(\mathbf{A} \mathbf{x} = \mathbf{b}\).
Theorem: If \(\mathbf{A} \mathbf{x} = \mathbf{b}\) is consistent, then \(\tilde{\mathbf{x}}\) is a solution if and only if \[
\tilde{\mathbf{x}} = \mathbf{A}^- \mathbf{b} + (\mathbf{I}_n - \mathbf{A}^- \mathbf{A}) \mathbf{q}
\] for some \(\mathbf{q} \in \mathbb{R}^n\).
Interpretation: a specific solution + a vector in \(\mathcal{N}(\mathbf{A})\).
Corollary: \(\mathbf{A} \mathbf{x} = \mathbf{b}\) is consistent for all\(\mathbf{b}\) if and only if \(\mathbf{A}\) has full row rank.
Corollary: If a system is consistent, its solution is unique if and only if \(\mathbf{A}\) has full column rank.
3 Invertible matrix
Assume that a square \(\mathbf{A} \in \mathbb{R}^{n \times n}\) has full row and column rank.
If a square \(\mathbf{A}\) has full row and column rank, then \(\mathbf{A}\) is invertible (or nonsingular).
Consider the left inverse \(\mathbf{X}\) that satisifies \(\mathbf{X} \mathbf{A} = \mathbf{I}\) and the right inverse \(\mathbf{Y}\) that satisfies \(\mathbf{A} \mathbf{Y} = \mathbf{I}\). Both left and right inverses exist (why?) and they are equal: \[
\mathbf{X} = \mathbf{X} \mathbf{I} = \mathbf{X} \mathbf{A} \mathbf{Y} = \mathbf{I} \mathbf{Y} = \mathbf{Y}.
\] Uniqueness of the left and right inverse follows from uniqueness of the basis expansion.
We say that \(\mathbf{A}\) is invertible (or nonsingular), and denote the (left and right) inverse by \(\mathbf{A}^{-1}\).
The generalized inverse \(\mathbf{A}^-\) is unique, which is the same as \(\mathbf{A}^{-1}\).
Proof: Let \(\mathbf{G}_1\) and \(\mathbf{G}_2\) be two generalized inverses of \(\mathbf{A}\). Then \(\mathbf{A} \mathbf{G}_1 \mathbf{A} = \mathbf{A}\) and \(\mathbf{A} \mathbf{G}_2 \mathbf{A} = \mathbf{A}\). Multiplying both equations by \(\mathbf{A}^{-1}\) on the left and right, we obtain \(\mathbf{G}_1 = \mathbf{A}^{-1} = \mathbf{G}_2\).
For any \(\mathbf{b}\), the unique solution is \(\mathbf{A}^{-1} \mathbf{b}\).
3.1 Examples of matrix inverse
Identity matrix \(\mathbf{I}_n\) is invertible, with inverse \(\mathbf{I}_n\).
A diagonal matrix is invertible if all diagonal entries are nonzero. \[
\text{diag}(a_{11}, \ldots, a_{nn})^{-1} = \text{diag}(a_{11}^{-1}, \ldots, a_{nn}^{-1}).
\]
A \(2 \times 2\) matrix is invertible if and only if \(a_{11} a_{22} \ne a_{12} a_{21}\)\[
\begin{pmatrix}
a_{11} & a_{12} \\
a_{21} & a_{22}
\end{pmatrix}^{-1} = \frac{1}{a_{11} a_{22} - a_{12} a_{21}} \begin{pmatrix}
a_{22} & -a_{12} \\
-a_{21} & a_{11}
\end{pmatrix}.
\]
Inverse of orthogonal matrix. If \(\mathbf{A}\) is square and has orthonormal columns, then \[
\mathbf{A}^{-1} = \mathbf{A}'.
\]
Inverse of matrix transpose. If \(\mathbf{A}\) is invertible, then \[
(\mathbf{A}')^{-1} = (\mathbf{A}^{-1})'.
\]
Inverse of matrix product. If \(\mathbf{A}\) and \(\mathbf{B}\) are invertible, then \[
(\mathbf{A} \mathbf{B})^{-1} = \mathbf{B}^{-1} \mathbf{A}^{-1}.
\]
For an invertible matrix \(\mathbf{A} \in \mathbb{R}^{n \times n}\) and \(\mathbf{b} \in \mathbb{R}^n\), we can solve the linear equation \(\mathbf{A} \mathbf{x} = \mathbf{b}\) by following steps.
Step 1: Compute QR factorization, e.g., by Gram–Schmidt or Household algorithm (HW4): \(\mathbf{A} = \mathbf{Q} \mathbf{R}\). \(2n^3\) flops.
Step 3: Solve the triangular equation \(\mathbf{R} \mathbf{x} = \mathbf{Q}' \mathbf{b}\) by back substitution. \(n^2\) flops.
The total number of flops is \(2n^3 + 3n^2 \approx 2n^3\).
Factor-solve method with multiple right-hand sides. For multiple right hand sides \(\mathbf{b}\), we only need to do Step 1 once.
Compute the inverse, \(\mathbf{B}\), of an invertible matrix \(\mathbf{A} \in \mathbb{R}^{n \times n}\).
Step 1. QR factorization \(\mathbf{A} = \mathbf{Q} \mathbf{R}\).
Step 2. For \(i=1,\ldots,n\), solve the triangular equation \(\mathbf{R} \mathbf{b}_i = \mathbf{s}_i\), where \(\mathbf{s}_i\) is the \(i\)-th row of \(\mathbf{Q}\).
Total computational cost is \(2n^3 + n^3 = 3n^3\) flops.
Question. Two ways to solve a linear system. qr(A) \ b vs inv(A) * b. Which way is better?
6 Solving linear equations by Gaussian elimination and LU factorization
In this section, we (1) review the Gaussian elimination (GE) for solving linear equations \(\mathbf{A} \mathbf{x} = \mathbf{b}\), and then (2) show that GE leads to a useful decomposition \[
\mathbf{A} = \mathbf{L} \mathbf{U}.
\]
Remark: In practice, LU decomposition is used much less frequently than Cholesky decomposition in statistics and machinear learning because \(\mathbf{A}\) is almost always symmetric and positive semidefinite in applications.
6.1 Gaussian elimination (GE)
Let’s review (from any undergraduate linear algebra textbook) how to solve a linear system \[
\begin{pmatrix}
2 & 1 & -1 \\ -3 & -1 & 2 \\ -2 & 1 & 2
\end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix} =
\begin{pmatrix}
8 \\ -11 \\ -3
\end{pmatrix}.
\]
Stage 1: Let’s eliminate variable \(x_1\) from equations (2) and (3).
Surprisingly, we find that \(\mathbf{A} = \mathbf{L} \mathbf{U}\)!
A ≈ L * U
true
Theorem: For any nonsingular \(\mathbf{A} \in \mathbb{R}^{n \times n}\), there exists a unique unit lower triangular matrix \(\mathbf{L}\) and upper triangular matrix \(\mathbf{U}\) such that \[
\mathbf{A} = \mathbf{L} \mathbf{U}.
\]
Given LU decomposition \(\mathbf{A} = \mathbf{L} \mathbf{U}\), for a new right hand side \(\mathbf{b}\), the solution to \(\mathbf{A} \mathbf{x} = \mathbf{L} \mathbf{U} \mathbf{x} = \mathbf{b}\) is readily given by two triangular solves: \[\begin{eqnarray*}
\mathbf{L} \mathbf{y} &=& \mathbf{b} \\
\mathbf{U} \mathbf{x} &=& \mathbf{y}.
\end{eqnarray*}\]
Indeed, computer performs GE/LU on a row-permuted version of \(\mathbf{A}\) (pivoting) for numerical stability. That is \[
\mathbf{P} \mathbf{A} = \mathbf{L} \mathbf{U},
\] where \(\mathbf{P}\) is a permutation matrix. All multipliers \(\ell_{ij}\) in \(\mathbf{L}\) has magnitude \(\le 1\).
Total computational cost of LU decomposition is \((2/3)n^3\) flops.
Remark: Total computational cost of the Cholesky decomposition is \((1/3)n^3\) flops.