using Pkg
Pkg.activate(pwd())
Pkg.instantiate()
Pkg.status()
using LinearAlgebra
# a 3x2 matrix
A = [-3 -4; 4 6; 1 1]
# X1 is a left inverse of A
X1 = [-11//9 -10//9 16//9; 7//9 8//9 -11//9]
Int.(X1 * A)
# B2 is another left inverse of A
X2 = [0 -1//2 3; 0 1//2 -2]
Int.(X2 * A)
# a 2x3 matrix
A = [1 0 1; 0 1 1]
# X1 is a right inverse of A
X1 = [1//2 -1//2; -1//2 1//2; 1//2 1//2]
Int.(A * X1)
# X2 is anoter right inverse of A
X2 = [1 0; 0 1; 0 0]
Int.(A * X2)
# X3 is yet anoter right inverse of A
X3 = [1 -1; 0 0; 0 1]
Int.(A * X3)
TODO: picture.
For $\mathbf{A} \in \mathbb{R}^{m \times n}$,
$\mathbf{A}$ is left invertible if and only if $\text{rank}(\mathbf{A}) = n$, or equivalently, $\mathbf{A}$ has full column rank, or equivalently $\mathbf{A}$ has independent columns. (Only a square or tall matrix can have left inverses.)
Proof of (1). For the if part, since column rank equals row rank, the columns of $\mathbf{A}$ span $\mathbb{R}^m$. Then for each column $\mathbf{e}_j$ of $\mathbf{I}_m$, there exists $\mathbf{c}_j$ such that $\mathbf{A} \mathbf{c}_j = \mathbf{e}_j$. Collecting $\mathbf{c}_j$ as columns of $\mathbf{C}$, we proved $\mathbf{A} \mathbf{C} = \mathbf{I}_m$.
For the only if part, since $\mathbf{A} \mathbf{C} = \mathbf{I}_m$, we have $m \ge \text{rank}(\mathbf{A}) \ge \text{rank}(\mathbf{A} \mathbf{C}) = \text{rank}(\mathbf{I}_m) = m$. Thus $\text{rank}(\mathbf{A}) = m$.
Proof of (2). Apply (1) to $\mathbf{A}'$.
If $\mathbf{A} \in \mathbb{R}^{m \times n}$ has a left inverse $\mathbf{X}$ and a right inverse $\mathbf{Y}$, then
$\mathbf{A}$ has a unique left inverse and a unique right inverse, which we call the inverse of $\mathbf{A}$ and denote it by $\mathbf{A}^{-1}$.
Proof: From previous theorem, we have $m = \text{rank}(\mathbf{A}) = n$. Thus $\mathbf{A}$ is square, and $\mathbf{X}$ and $\mathbf{Y}$ as well. Then $$ \mathbf{X} = \mathbf{X} \mathbf{I}_n = \mathbf{X} (\mathbf{A} \mathbf{Y}) = (\mathbf{X} \mathbf{A}) \mathbf{Y} = \mathbf{Y}. $$ To show the uniqueness of left inverse, suppose $\mathbf{Z}$ is another left inverse. Then $$ \mathbf{Z} \mathbf{A} = \mathbf{I}_n = \mathbf{X} \mathbf{A}. $$ Multiplying both sides on the right by right inverse $\mathbf{Y}$, we obtain $\mathbf{Z} = \mathbf{X}$. Thus left inverse is unique. Uniqueness of right inverse is shown in a similar fashion.
Invertibility conditions. For a square matrix,
Therefore, for a square matrix $\mathbf{A}$, the following are equivalent:
Let $\mathbf{A} \in \mathbb{R}^{m \times n}$. Then
$\mathbf{A}$ has full column rank if and only if $\mathbf{A}'\mathbf{A}$ is nonsingular. In this case, $(\mathbf{A}'\mathbf{A})^{-1} \mathbf{A}'$ is a left inverse of $\mathbf{A}$.
$\mathbf{A}$ has full row rank if and only if $\mathbf{A}\mathbf{A}'$ is nonsingular. In this case, $\mathbf{A}'(\mathbf{A} \mathbf{A}')^{-1}$ is a right inverse of $\mathbf{A}$.
Proof: easy.
A = [-1 1 -3; 1 -1 1; 2 2 2]
A⁻¹ = inv(A)
A⁻¹ * A
A * A⁻¹
Examples of matrix inverse:
Identity matrix $\mathbf{I}_n$ is invertible, with inverse $\mathbf{I}_n$.
A diagonal matrix is invertible if all diagonal entries are nonzero. $$ \text{diag}(a_{11}, \ldots, a_{nn})^{-1} = \text{diag}(a_{11}^{-1}, \ldots, a_{nn}^{-1}). $$
A $2 \times 2$ matrix is invertible if and only if $a_{11} a_{22} \ne a_{12} a_{21}$ $$ \begin{pmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{pmatrix}^{-1} = \frac{1}{a_{11} a_{22} - a_{12} a_{21}} \begin{pmatrix} a_{22} & -a_{12} \\ -a_{21} & a_{11} \end{pmatrix}. $$
Inverse of orthogonal matrix. If $\mathbf{A}$ is square and has orthonormal columns, then $$ \mathbf{A}^{-1} = \mathbf{A}'. $$
Inverse of matrix transpose. If $\mathbf{A}$ is invertible, then $$ (\mathbf{A}')^{-1} = (\mathbf{A}^{-1})'. $$
Inverse of matrix product. If $\mathbf{A}$ and $\mathbf{B}$ are invertible, then $$ (\mathbf{A} \mathbf{B})^{-1} = \mathbf{B}^{-1} \mathbf{A}^{-1}. $$
Solving linear system \begin{eqnarray*} a_{11} x_1 + \cdots + a_{1n} x_n &=& b_1 \\ a_{21} x_1 + \cdots + a_{2n} x_n &=& b_2 \\ &\vdots& \\ a_{m1} x_1 + \cdots + a_{mn} x_n &=& b_m \end{eqnarray*} or $$ \mathbf{A} \mathbf{x} = \mathbf{b}, $$ where $\mathbf{A} \in \mathbb{R}^{m \times n}$ (coefficient matrix), $\mathbf{x} \in \mathbb{R}^n$ (solution vector), and $\mathbf{b} \in \mathbb{R}^m$ (right hand side) is a central theme in linear algebra.
Left-invertible matrix (over-determined linear system): If $\mathbf{X}$ is a left inverse of $\mathbf{A}$, then $$ \mathbf{A} \mathbf{x} = \mathbf{b} $$ implies (multiplying both sizes on the left by $\mathbf{X}$) $$ \mathbf{x} = \mathbf{X} \mathbf{b}. $$ There is at most one solution. If there is a solution, it must be equal to $\mathbf{X} \mathbf{b}$. Note there might be no solution! $\mathbf{X} \mathbf{b}$ gives us a candidate solution to check. Why there can be at most one solution? (Recall there can be infinitely many left inverses. Why each left inverse will produce the same solution?)
Right-invertible matrix (under-determined linear system): If $\mathbf{X}$ is a right inverse of $\mathbf{A}$, then $$ \mathbf{A} (\mathbf{X} \mathbf{b}) = \mathbf{b}. $$ This says there is at least one solution $\mathbf{x} = \mathbf{X} \mathbf{b}$.
Invertible matrix: If a square $\mathbf{A}$ is invertible, then $$ \mathbf{A} \mathbf{x} = \mathbf{b} $$ implies (multiplying both sizes of $\mathbf{A}^{-1}$) $$ \mathbf{x} = \mathbf{A}^{-1} \mathbf{b}. $$ There is a unique solution.
Example. Consider the 3x2 matrix $$ \mathbf{A} = \begin{pmatrix} -3 & -4 \\ 4 & 6 \\ 1 & 1 \end{pmatrix}. $$
# a 3x2 matrix
A = [-3 -4; 4 6; 1 1]
with left inverses
# X1 is a left inverse of A
X1 = [-11//9 -10//9 16//9; 7//9 8//9 -11//9]
# B2 is another left inverse of A
X2 = [0 -1//2 3; 0 1//2 -2]
The over-determined linear equations $$ \begin{pmatrix} -3 & -4 \\ 4 & 6 \\ 1 & 1 \end{pmatrix} \mathbf{x} = \begin{pmatrix} 1 \\ -2 \\ 0 \end{pmatrix} $$ has a unique solution $\mathbf{x} = \begin{pmatrix} 1 \\ -1 \end{pmatrix}$, which can be obtained from either of the left inverses.
Int.(X1 * [1, -2 , 0])
Int.(X2 * [1, -2 , 0])
The over-determined linear equations $$ \begin{pmatrix} -3 & -4 \\ 4 & 6 \\ 1 & 1 \end{pmatrix} \mathbf{x} = \begin{pmatrix} 1 \\ -1 \\ 0 \end{pmatrix} $$ has no solution since $\mathbf{x} = \mathbf{X}_1 \begin{pmatrix} 1 \\ -1 \\ 0 \end{pmatrix} = \begin{pmatrix} 1/2 \\ -1/2 \end{pmatrix}$ does not satisfy $\mathbf{A} \mathbf{x} = \begin{pmatrix} 1 \\ -1 \\ 0 \end{pmatrix}$.
A * (X1 * [1, -1, 0]) == [1, -1, 0]
The under-determined linear equations $$ \mathbf{A}' \mathbf{y} = \begin{pmatrix} -3 & 4 & 1 \\ -4 & 6 & 1 \end{pmatrix} \begin{pmatrix} y_1 \\ y_2 \\ y_3 \end{pmatrix} = \begin{pmatrix} 1 \\ 2 \end{pmatrix} $$ has (different) solutions
X1' * [1, 2]
# check solution
A' * (X1' * [1, 2]) == [1, 2]
X2' * [1, 2]
# check solution
A' * (X2' * [1, 2]) == [1, 2]
Recall that both $\mathbf{X}_1'$ and $\mathbf{X}_2'$ are right inverses of $\mathbf{A}'$.
To solve $\mathbf{A} \mathbf{x} = \mathbf{b}$, where $\mathbf{A} \in \mathbb{R}^{n \times n}$ is lower triangular $$ \begin{pmatrix} a_{11} & 0 & \cdots & 0 \\ a_{21} & a_{22} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ a_{n1} & a_{n2} & \cdots & a_{nn} \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{pmatrix} = \begin{pmatrix} b_1 \\ b_2 \\ \vdots \\ b_n \end{pmatrix}. $$
Forward substitution algorithm: $$ \begin{eqnarray*} x_1 &=& b_1 / a_{11} \\ x_2 &=& (b_2 - a_{21} x_1) / a_{22} \\ x_3 &=& (b_3 - a_{31} x_1 - a_{32} x_2) / a_{33} \\ &\vdots& \\ x_n &=& (b_n - a_{n1} x_1 - a_{n2} x_2 - \cdots - a_{n,n-1} x_{n-1}) / a_{nn}. \end{eqnarray*} $$ $1 + 3 + 5 + \cdots + (2n-1) = n^2$ flops.
To solve $\mathbf{A} \mathbf{x} = \mathbf{b}$, where $\mathbf{A} \in \mathbb{R}^{n \times n}$ is upper triangular
$$
\begin{pmatrix}
a_{11} & \cdots & a_{1,n-1} & a_{1n} \\
\vdots & \ddots & \vdots & \vdots \\
0 & \cdots & a_{n-1,n-1} & a_{n-1,n} \\
0 & 0 & 0 & a_{nn}
\end{pmatrix}
\begin{pmatrix}
x_1 \\ \vdots \\ x_{n-1} \\ x_n
\end{pmatrix} = \begin{pmatrix}
b_1 \\ \vdots \\ b_{n-1} \\ b_n
\end{pmatrix}.
$$
Back substitution algorithm:
$$
\begin{eqnarray*}
x_n &=& b_n / a_{nn} \\
x_{n-1} &=& (b_{n-1} - a_{n-1,n} x_n) / a_{n-1,n-1} \\
x_{n-2} &=& (b_{n-2} - a_{n-2,n-1} x_{n-1} - a_{n-2,n} x_n) / a_{n-2,n-2} \\
&\vdots& \\
x_1 &=& (b_1 - a_{12} x_2 - a_{13} x_3 - \cdots - a_{1,n} x_{n}) / a_{11}.
\end{eqnarray*}
$$
$n^2$ flops.
For an invertible matrix $\mathbf{A} \in \mathbb{R}^{n \times n}$ and $\mathbf{b} \in \mathbb{R}^n$, we can solve the linear equation $\mathbf{A} \mathbf{x} = \mathbf{b}$ by following steps.
Step 3: Solve the triangular equation $\mathbf{R} \mathbf{x} = \mathbf{Q}' \mathbf{b}$ by back substitution. $n^2$ flops.
The total number of flops is $2n^3 + 3n^2$. The dominant step is $2n^3$.
Factor-solve method with multiple right-hand sides. For multiple right hand sides $\mathbf{b}$, we only need to do Step 1 once.
Compute the inverse, $\mathbf{B}$, of an invertible matrix $\mathbf{A} \in \mathbb{R}^{n \times n}$.
Step 2. For $i=1,\ldots,n$, solve the triangular equation $\mathbf{R} \mathbf{b}_i = \mathbf{s}_i$, where $\mathbf{s}_i$ is the $i$-th row of $\mathbf{Q}$.
Total computational cost is $2n^3 + n^3 = 3n^3$ flops.
Question. Two ways to solve a linear system. qr(A) \ b
vs inv(A) * b
. Which way is better?
In this section, we (1) review the Gaussian elimination (GE) for solving linear equations $\mathbf{A} \mathbf{x} = \mathbf{b}$, and then (2) show that GE leads to a useful decomposition $$ \mathbf{A} = \mathbf{L} \mathbf{U}. $$
Let's review (from any undergraduate linear algebra textbook) how to solve a linear system $$ \begin{pmatrix} 2 & 1 & -1 \\ -3 & -1 & 2 \\ -2 & 1 & 2 \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix} = \begin{pmatrix} 8 \\ -11 \\ -3 \end{pmatrix}. $$
Stage 1: Let's eliminate variable $x_1$ from equations (2) and (3).
Multiplying equation (1) by $\ell_{21} = 3/2$ and adds to equation (2) leads to $$ \begin{pmatrix} 2 & 1 & -1 \\ 0 & 1/2 & 1/2 \\ -2 & 1 & 2 \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix} = \begin{pmatrix} 8 \\ 1 \\ -3 \end{pmatrix}. $$
Multiplying equation (1) by $\ell_{31} = 1$ and adds to equation (3) leads to $$ \begin{pmatrix} 2 & 1 & -1 \\ 0 & 1/2 & 1/2 \\ 0 & 2 & 1 \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix} = \begin{pmatrix} 8 \\ 1 \\ 5 \end{pmatrix}. $$
Stage 2: Let's eliminate variable $x_2$ from equation (3).
Multiplying equation (2) by $\ell_{32} = -4$ and adds to equation (3) leads to $$ \begin{pmatrix} 2 & 1 & -1 \\ 0 & 1/2 & 1/2 \\ 0 & 0 & -1 \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix} = \begin{pmatrix} 8 \\ 1 \\ 1 \end{pmatrix}. $$
Stage 3: Now we can collect results by back-solve or back-substitution.
From the equation (3), $x_3 = -1$.
Substituting $x_3 = -1$ to equation (2) and solve for $x_2 = 3$.
Substituting $x_2=3$ and $x_3 = 1$ to equation (1) and solve for $x_2 = 2$.
This is essentially how computer solves linear equation:
using LinearAlgebra
A = [2.0 1.0 -1.0; -3.0 -1.0 2.0; -2.0 1.0 2.0]
b = [8.0, -11.0, -3.0]
# Julia way to solve linear equation
# equivalent to `solve(A, b)` in R
A \ b
Let's collect those multipliers $-\ell_{ij}$ into a unit lower triangular matrix $\mathbf{L}$
L = [1.0 0.0 0.0; -1.5 1.0 0.0; -1.0 4.0 1.0]
and save the resultant upper triangular matrix after GE into $\mathbf{U}$
U = [2.0 1.0 -1.0; 0.0 0.5 0.5; 0.0 0.0 -1.0]
Surprisingly, we find that $\mathbf{A} = \mathbf{L} \mathbf{U}$!
A ≈ L * U
Theorem: For any nonsingular $\mathbf{A} \in \mathbb{R}^{n \times n}$, there exists a unique unit lower triangular matrix $\mathbf{L}$ and upper triangular matrix $\mathbf{U}$ such that $$ \mathbf{A} = \mathbf{L} \mathbf{U}. $$
Given LU decomposition $\mathbf{A} = \mathbf{L} \mathbf{U}$, for a new right hand side $\mathbf{b}$, the solution to $\mathbf{A} \mathbf{x} = \mathbf{L} \mathbf{U} \mathbf{x} = \mathbf{b}$ is readily given by two triangular solves: \begin{eqnarray*} \mathbf{L} \mathbf{y} &=& \mathbf{b} \\ \mathbf{U} \mathbf{x} &=& \mathbf{y}. \end{eqnarray*}
Indeed, computer performs GE/LU on a row-permuted version of $\mathbf{A}$ (pivoting) for numerical stability. That is $$ \mathbf{P} \mathbf{A} = \mathbf{L} \mathbf{U}, $$ where $\mathbf{P}$ is a permutation matrix. All multipliers $\ell_{ij}$ in $\mathbf{L}$ has magnitude $\le 1$.
Total computational cost of LU decomposition is $(2/3)n^3$ flops.
A
Alu = lu(A)
Alu.p
A[Alu.p, :] ≈ Alu.L * Alu.U
Let $\mathbf{A} \in \mathbb{R}^{m \times n}$. A matrix $\mathbf{G} \in \mathbb{R}^{n \times m}$ is called the Moore-Penrose inverse or MP inverse of $\mathbf{A}$ if it satisifes following four conditions:
(1) $\mathbf{A} \mathbf{G} \mathbf{A} = \mathbf{A}$. $\quad \quad$ (generalized inverse, $g_1$ inverse, or inner pseudo-inverse).
(2) $\mathbf{G} \mathbf{A} \mathbf{G} = \mathbf{G}$. $\quad \quad$ (outer pseudo-inverse).
(3) $(\mathbf{A} \mathbf{G})' = \mathbf{A} \mathbf{G}$.
(4) $(\mathbf{G} \mathbf{A})' = \mathbf{G} \mathbf{A}$.
We shall denote the Moore-Penrose inverse of $\mathbf{A}$ by $\mathbf{A}^+$.
Any matrix $\mathbf{G} \in \mathbb{R}^{n \times m}$ that satisfies (1) is called a generalized inverse, or $g_1$ inverse, or inner pseudo-inverse. Denoted by $\mathbf{A}^-$ or $\mathbf{A}^g$.
Generalized inverse is not unique in general.
Definition (1) for generalized inverses is motivated by the following result.
If $\mathbf{A} \mathbf{x} = \mathbf{b}$ has solution(s), then $\mathbf{A}^- \mathbf{b}$ is a solution for any generalized inverse $\mathbf{A}^-$.
Proof: $\mathbf{A} (\mathbf{A}^- \mathbf{b}) = \mathbf{A} (\mathbf{A}^- \mathbf{A} \mathbf{x}) = (\mathbf{A} \mathbf{A}^- \mathbf{A}) \mathbf{x} = \mathbf{A} \mathbf{x} = \mathbf{b}$.
Any matrix $\mathbf{G} \in \mathbb{R}^{n \times m}$ that satisfies (1)+(2) is called a reflective generalized inverse, or $g_2$ inverse, or outer pseudo-inverse. Denoted by $\mathbf{A}^*$.
The Moore-Penrose inverse of any matrix $\mathbf{A}$ exists and is unique.
Proof of uniqueness: Let $\mathbf{G}_1, \mathbf{G}_2 \in \mathbb{R}^{n \times m}$ be two matrices satisfying properties (1)-(4). Then
$$
\mathbf{A} \mathbf{G}_1 = (\mathbf{A} \mathbf{G}_1)' = \mathbf{G}_1' \mathbf{A}' = \mathbf{G}_1' (\mathbf{A} \mathbf{G}_2 \mathbf{A})' = \mathbf{G}_1' \mathbf{A}' \mathbf{G}_2' \mathbf{A}' = (\mathbf{A} \mathbf{G}_1)' (\mathbf{A} \mathbf{G}_2)' = \mathbf{A} \mathbf{G}_1 \mathbf{A} \mathbf{G}_2 = \mathbf{A} \mathbf{G}_2.
$$
Similarly,
$$
\mathbf{G}_1 \mathbf{A} = (\mathbf{G}_1 \mathbf{A})' = \mathbf{A}' \mathbf{G}_1' = (\mathbf{A} \mathbf{G}_2 \mathbf{A})' \mathbf{G}_1' = \mathbf{A}' \mathbf{G}_2' \mathbf{A}' \mathbf{G}_1' = (\mathbf{G}_2 \mathbf{A})' (\mathbf{G}_1 \mathbf{A})' = \mathbf{G}_2 \mathbf{A} \mathbf{G}_1 \mathbf{A} = \mathbf{G}_2 \mathbf{A}.
$$
Hence,
$$
\mathbf{G}_1 = \mathbf{G}_1 \mathbf{A} \mathbf{G}_1 = \mathbf{G}_1 \mathbf{A} \mathbf{G}_2 = \mathbf{G}_2 \mathbf{A} \mathbf{G}_2 = \mathbf{G}_2.
$$
Proof of existence: TODO later. We construct a matrix that satisfies (1)-(4) using the singular value decomposition (SVD) of $\mathbf{A}$.
Following are true:
$(k \mathbf{A})^+ = (1/k) \mathbf{A}^+$.
Proof: Check properties (1)-(4).
Multiplication by generalized inverse does not change rank.
$\text{rank}(\mathbf{A}) = \text{rank}(\mathbf{A} \mathbf{A}^-) = \text{rank}(\mathbf{A}^- \mathbf{A})$.
Proof of 1: We already know $\mathcal{C}(\mathbf{A}) \supseteq \mathcal{C}(\mathbf{A} \mathbf{A}^-)$. Now since $\mathbf{A} = \mathbf{A} \mathbf{A}^- \mathbf{A}$, we also have $\mathcal{C}(\mathbf{A}) \subseteq \mathcal{C}(\mathbf{A} \mathbf{A}^-)$.
Proof of 2: Since $\mathbf{A} = \mathbf{A} \mathbf{A}^- \mathbf{A}$, $\text{rank}(\mathbf{A}) \le \text{rank}(\mathbf{A} \mathbf{A}^-) \le \text{rank}(\mathbf{A})$.
Generalized inverse has equal or larger rank. $\text{rank}(\mathbf{A}^-) \ge \text{rank}(\mathbf{A})$.
Proof: $\text{rank}(\mathbf{A}) = \text{rank}(\mathbf{A}\mathbf{A}^- \mathbf{A}) \le \text{rank}(\mathbf{A}\mathbf{A}^-) \le \text{rank}(\mathbf{A}^-)$.
Let $\mathbf{A} \in \mathbb{R}^{m \times n}$.
If $\mathbf{A}$ has full column rank, then the MP inverse is $\mathbf{A}^+ = (\mathbf{A}'\mathbf{A})^{-1} \mathbf{A}'$. Given QR factorization $\mathbf{A} = \mathbf{Q} \mathbf{R}$, $\mathbf{A}^+ = \mathbf{R}^{-1} \mathbf{Q}'$.
If $\mathbf{A}$ has full row rank, then the MP inverse is $\mathbf{A}^+ = \mathbf{A}'(\mathbf{A} \mathbf{A}')^{-1}$. Given QR factorization $\mathbf{A}' = \mathbf{Q} \mathbf{R}$, $\mathbf{A}^+ = \mathbf{Q} \mathbf{R}'^{-1}$.
# a 3x2 matrix (full column rank)
A = [-3 -4; 4 6; 1 1]
# compute MP inverse using SVD
pinv(A)
# compute MP inverse using pseudo-inverse
(A'A) \ A'
# compute MP inverse by QR
qra = qr(A)
qra.R \ qra.Q[:, 1:2]'