Linear Algebra

Hao Su

Winter, 2022

Note: the slides are based on CSE131 (Juan Carlos et al.) and EE263 (by Stephen Boyd et al.) at Stanford.

Agenda

click to jump to the section.

Course Logistics

Exams and Assignments

Exams (20%)

A final exam
No in-class exams

Homework (80%)

Programming and Q&A based problems
Weekly homeworks without late days
You can drop 1 assignment without penalty
Expected time to solve all problems in each homework: $\sim 5$ hours
Each homework solicits your feedback on teaching (1 free credit)

Some extra credits

Collaboration Policy

You can work individually or form study groups of any number of members
For study groups,
- each individual must create and submit your own solution of all problems
- please clearly indicate the name of your collaborator and the problems you have discussed

Linear Algebra in Computer Vision

Matrix

A matrix $A$ is an array of numbers with size $m$ by $n$, i.e., $m$ rows and $n$ columns \[ A= \begin{bmatrix} a_{11} & a_{12} & a_{13} & \ldots & a_{1n} \\ a_{21} & a_{22} & a_{23} & \ldots & a_{2n} \\ \vdots & & & & \vdots \\ a_{m1} & a_{m2} & a_{m3} & \ldots & a_{mn} \end{bmatrix} \]
If $m=n$, we say that $A$ is square.
To denote the shape of $A$, we often write $A\in\R{m\times n}$

Images

Python represents an image as a matrix of pixel brightness
Note that the upper left corner is $(y,x)=[0,0]$

Color Images

Grayscale images have one number per pixel, and are stored as an $m\times n$ matrix
Color images have 3 numbers per pixel — red, green, and blue brightness (RGB)
Stored as an $m\times n\times 3$ matrix (height * width * channel)
Note: For matrices with channels, it is also called a "tensor"

Code Example: Load an Image


from PIL import Image # a library for image IO
import numpy as np # a library for linear algebra
import torch # a library for deep learning

file_path = "./cvpr.png"
im = Image.open(file_path)

# display the image on screen
im.show() 

# convert the image to numpy format and then to torch format, and normalize pixel value scale to [0, 1]
image_tensor = torch.FloatTensor(np.array(im) / 255)

# print the size of the matrix
print(image_tensor.shape) 
# out: 
# torch.Size([853, 1259, 3])

# print the upper-left corner of the first channel of the image
print(image_tensor[:3, :3, 0])
# out: 
# tensor([[0.4510, 0.4471, 0.4471],
#         [0.4549, 0.4549, 0.4549],
#         [0.4549, 0.4549, 0.4549]])

Vector

When the number of column of a matrix is 1, it is called a vector: \[ v= \begin{bmatrix} v_1\\v_2\\\vdots\\v_n \end{bmatrix} \]
We often denote the shape of $v$ as $v\in\R{n}$
A row vector $v^T\in\R{1\times n}$ where \[ v^T=[v_1, v_2, \ldots, v_n] \] where $T$ denotes the transpose operation

Example: Image Embedding

Place images on the plane so that
clothes with similar styles are closer \[ \text{position}(I)= \begin{bmatrix} x\\ y \end{bmatrix} \] $\text{position}(\cdot)$ is a function of matrix

http://proc-x.com/2017/09/a-vgg-like-cnn-for-fashion-mnist-with-94-accuracy/

Image Understanding as Linear Algebra

Matrix is the natural representation of images
In linear algebra, we learned a lot of knowledge about matrices
So we need a solid grasp of linear algebra to understand images

Basic Roles of Matrices and Vectors in CV

To represent images (we have shown examples)
To represent geometries, e.g.,
- 3D vectors are natural representations of points in phyiscal space
- The quadratic equation of ellipses can be represented as $x^TAx=1$
To build transformations to operate on images or geometries, e.g.,
- Build image filters using matrices to sharpen or smooth images
- Deep learning uses matrix to learn feature extractors

Next, we review vectors and matrices as
data representation and transformations

Basic Vector and Matrix Operations

Vector Addition and Scaling

Addition \[ \begin{bmatrix} a \\ b \end{bmatrix} + \begin{bmatrix} 1 \\ 2 \end{bmatrix} = \begin{bmatrix} a+1 \\ b+2 \end{bmatrix} \]
Scaling \[ \begin{bmatrix} a \\ b \end{bmatrix}\times 3 = \begin{bmatrix} 3a \\ 3b \end{bmatrix} \]

Geometry of Vector Addition

A 2D vector is an arrow in 2D plane
A 3D vector is an arrow in 3D space
Vector addition composes vectors

Dot Product (a.k.a. Inner Product)

Multiply corresponding entries of two vectors and add up the result: \[ x^T y=[x_1\ldots x_n] \begin{bmatrix} y_1\\\vdots\\y_n \end{bmatrix}= \sum_{i=1}^n x_iy_i \qquad \text{(scalar)} \]

Dot Product (a.k.a. Inner Product)

$x^Ty$ is also $|x||y|\cos(\text{the angle between $x$ and $y$})$
If $y$ is a unit vector, then $x^Ty$ gives the length of $x$ which lies in the direction of $y$.
Q: if $y$ is not a unit vector, how to compute the length of $x$ projected on $y$?

Vector Norm

Vector norm is a notion of vector length
The most common norm is the $\ell_2$ norm: \[ \|x\|_2=\sqrt{\sum_{i=1}^n x_i^2} \]
$\ell_2$ norm can be represented by inner product: \[ \|x\|_2=\sqrt{x^T x} \]
We can compute distance between points by norms: \[ \text{dist}(x, y)=\|x-y\|_2 \]

Matrix Addition and Scaling

Similar to vectors, matrix can also be added or scaled:

Addition \[ \begin{bmatrix} a & b\\ c & d \end{bmatrix} + \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} = \begin{bmatrix} a+1 & b+2 \\ c+3 & d+4 \end{bmatrix} \]
Scaling \[ \begin{bmatrix} a & b \\ c & d \end{bmatrix}\times 3 = \begin{bmatrix} 3a & 3b \\ 3c & 3d \end{bmatrix} \]

Matrix-Vector Product

But the most important usage of matrix is that, matrix can tranform vectors.

Matrix-Vector Product

Assume $ A= \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \cdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \\ \end{bmatrix}= \begin{bmatrix} a_1^T \\ a_2^T \\ \cdots \\ a_m^T \end{bmatrix} \in \R{m\times n}, \ x= \begin{bmatrix} x_1 \\ x_2 \\ \cdots \\ x_n \end{bmatrix} $
Then $ y=Ax= \begin{bmatrix} a_1^T x\\ a_2^T x\\ \cdots \\ a_m^T x\\ \end{bmatrix} $

Scaling

Consider a 2D vector $\begin{bmatrix}x\\y\end{bmatrix}$ on the plane
We can use a $2\times 2$ diagonal matrix to multiply it: \[ \begin{bmatrix} s_x & 0 \\ 0 & s_y \end{bmatrix} \cdot \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} s_x x\\ s_y y \end{bmatrix} \] (verify by yourself that matrix multiplication works out this way)
Denote $p=[x,y]^T$, $p'=[x',y']^T$ and \[ S= \begin{bmatrix} s_x & 0\\ 0 & s_y \end{bmatrix} \]
$p'=Sp$

Rotation

Counter-clockwise rotation by an angle $\theta$

The formula is: \[ \begin{bmatrix} x' \\ y' \end{bmatrix}= \begin{bmatrix} \cos\theta & -\sin\theta\\ \sin\theta & \cos\theta \end{bmatrix} \begin{bmatrix} x\\y \end{bmatrix} \]
Denote \[ R= \begin{bmatrix} \cos\theta & -\sin\theta\\ \sin\theta & \cos\theta \end{bmatrix} \]
$p'=Rp$

Composition of Transformations

Multiple transformation matrices can be used to transform a point: \[ p'=R_2R_1S p \]
The effect of this is to apply their transformations one after the other, from right to left
In the example above, the result is $$(R_2(R_1(Sp)))$$
We can use the rules of matrix multiplication to compute from left to right $$p'=(R_2R_1S)p$$

Matrix Product

The product of two matrices
$C=AB\in\mathbb{R}^{m\times p}$
$C_{ij}=\sum_{i=1}^n A_{ik} B_{kj}$
Block matrix: $AB=A[b_1, b_2, …, b_p]=[Ab_1, Ab_2, …, Ab_p]$

Matrix Product Rules

Association: $(AB)C=A(BC)$
Distributive: $A(B+C)=AB+AC$
Cautious: matrix multiplication is generally NOT commutative: $AB\neq BA$
- For example, if $A\in\R{m\times n}$ and $B\in\R{n\times q}$, then matrix product $BA$ does not even exist if $m\neq q$!

Matrix Product Rules

Transpose: flip matrix so that rows become columns, and columns become rows
For example: \[ \begin{bmatrix} 0 & 1\\2 & 3\\4 & 5 \end{bmatrix}^T= \begin{bmatrix} 0 & 2 & 4 \\ 1 & 3 & 5 \end{bmatrix} \]
A useful identity: \[ (ABC)^T=C^TB^TA^T \]

Special Matrices

Identity matrix $I$: \[ I_{3\times 3}= \begin{bmatrix} 1 & 0 & 0\\0 & 1 & 0\\ 0 & 0 & 1 \end{bmatrix} \] When applied on any vectors, the vector does not change: $I x= x$.
Diagonal matrix. For example: \[ \begin{bmatrix} 3 & 0 & 0 \\ 0 & 7 & 0 \\ 0 & 0 & 2.5 \end{bmatrix} \] When applied on a vector, the dimensions scale according to diagonal elements.

Special Matrices

Symmetric matrix: $A^T=A$ \[ \begin{bmatrix} 1 & 2 & 5\\ 2 & 1 & 7\\ 5 & 7 & 1 \end{bmatrix} \] We will see its usage in subsequent lectures.
Orthonormal matrix: $A A^T=A^TA=I$
- Verify that rotation matrix $R$ is an orthonormal matrix

2D and 3D Homogeneous Transformation

Limitation of Linear Transformation

We learned about scaling and rotation
In general, a matrix multiplication lets us linearly combine components of a vector $x$ \[ \begin{bmatrix} a & b\\ c & d \end{bmatrix}\cdot \begin{bmatrix} x_1\\x_2 \end{bmatrix}= \begin{bmatrix} ax_1+bx_2\\ cx_1+dx_2 \end{bmatrix} \]
- This is sufficient for scale, rotate, skew transformations
- But notice, the output components do not include a constant term! :(

Limitation of Linear Transformation

Geometrically, the origin $[0,0]$ is always transformed to $[0,0]$.
Question:
- can we translate the vertices of a square by a matrix?
- No, we cannot! Because the origin point does not move by a linear transformation.

Homogeneous Transformation

Homogeneous coordinate:
- Stick a "1" at the end of every vector in Euclidean space: $ \begin{bmatrix} x \\ y \end{bmatrix} \Rightarrow \begin{bmatrix} x \\ y \\ 1 \end{bmatrix} $
- Use a 3D vector to represent a 2D point
We can use a $3\times 3$ matrix with a special structure to transform the 3D-version of the 2D point: \[ \begin{bmatrix} a & b & c\\ d & e & f \\ 0 & 0 & 1 \end{bmatrix}\cdot \begin{bmatrix} x \\ y \\ 1 \end{bmatrix}= \begin{bmatrix} ax + by + c\\dx + ey + f \\ 1 \end{bmatrix} \]

Homogeneous Transformation

\[ \begin{bmatrix} a & b & c\\ d & e & f \\ 0 & 0 & 1 \end{bmatrix}\cdot \begin{bmatrix} x \\ y \\ 1 \end{bmatrix}= \begin{bmatrix} ax + by + c\\dx + ey + f \\ 1 \end{bmatrix} \]

Note that the last element of the transformed point is always 1
When decoding the position of the homogeneous coordinate, we discard the last element 1
The augmented $3\times 3$ matrix can achieve more than the $2\times 2$ matrix
- Check that the origin point $[0,0]^T$ is moved to $[c,f]^T$
Now we can rotate, scale, and skew like before, AND translate

Translation

Translation by Homogeneous Transformation

To move $p=(x,y)$ to $p'=(x+t_x, y+t_y)$, we take the following steps:

Write down the homogeneous coordinate: \[ p=(x,y)\rightarrow (x,y,1) \]
Build the homogeneous transformation matrix: \[ T= \begin{bmatrix} 1 & 0 & t_x\\0 & 1 & t_y\\0 & 0 & 1 \end{bmatrix}. \quad \text{In block matrix form, written as } T= \begin{bmatrix} I & t\\0 & 1 \end{bmatrix} \]

Translation by Homogeneous Transformation

To move $p=(x,y)$ to $p'=(x+t_x, y+t_y)$, we take the following steps:
Move the point by homogeneous transformation: \[ \begin{bmatrix} x+t_x\\y+t_y\\1 \end{bmatrix}= \begin{bmatrix} 1 & 0 & t_x\\0 & 1 & t_y\\0 & 0 & 1 \end{bmatrix}\cdot \begin{bmatrix} x\\y\\1 \end{bmatrix} \]

Scaling

Scaling by Homogeneous Transformation

To move $p=(x,y)$ to $p'=(s_x x, s_y y)$, we take the following steps:

Write down the homogeneous coordinate: \[ p=(x,y)\rightarrow (x,y,1) \]
Build the homogeneous scaling matrix: \[ S= \begin{bmatrix} s_x & 0 & 0\\0 & s_y & 0\\0 & 0 & 1 \end{bmatrix}. \quad \text{In block matrix form, written as } S= \begin{bmatrix} S' & 0\\0 & 1 \end{bmatrix} \]

Scaling by Homogeneous Transformation

To move $p=(x,y)$ to $p'=(s_x x, s_y y)$, we take the following steps:
Move the point by homogeneous transformation: \[ \begin{bmatrix} s_x x\\s_y y\\1 \end{bmatrix}= \begin{bmatrix} s_x & 0 & 0\\0 & s_y & 0\\0 & 0 & 1 \end{bmatrix}\cdot \begin{bmatrix} x\\y\\1 \end{bmatrix} \]

Scaling and Translation

We can first scale the leaf and then translate the leaf:
1. $p'=S\cdot p$
2. $p''=T\cdot p'$
Compose the operations together, we can do: \[ p''=T\cdot p'=T\cdot (S\cdot p)=T\cdot S\cdot p \]

Scaling and Translation

\begin{align*} & p''=T\cdot S\cdot p= \begin{bmatrix} 1 & 0 & t_x\\ 0 & 1 & t_y\\ 0 & 0 & 1 \end{bmatrix}\cdot \begin{bmatrix} s_x & 0 & 0 \\ 0 & s_y & 0\\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} x \\ y \\ 1 \end{bmatrix}=\\ & = {\color{red} \begin{bmatrix} s_x & 0 & t_x\\ 0 & s_y & t_y \\ 0 & 0 & 1 \end{bmatrix}} \begin{bmatrix} x \\ y \\ 1 \end{bmatrix}= \begin{bmatrix} s_x x+t_x\\ s_y y +t_y \\ 1 \end{bmatrix}= \begin{bmatrix} S & t\\ 0 & 1 \end{bmatrix} \begin{bmatrix} x \\ y \\ 1 \end{bmatrix} \end{align*}

Scaling & Translation $\neq$ Translation & Scaling

\[ \begin{aligned} T\cdot S&= \begin{bmatrix} s_x & 0 & t_x\\0 & s_y & t_y\\ 0 & 0 & 1 \end{bmatrix} &\Rightarrow p''&= \begin{bmatrix} s_x x + t_x\\s_y y + t_y \\ 1 \end{bmatrix} \\ S\cdot T&= \begin{bmatrix} s_x & 0 & s_x t_x\\0 & s_y & s_yt_y\\ 0 & 0 & 1 \end{bmatrix} &\Rightarrow p''&= \begin{bmatrix} s_x x + s_x t_x\\s_y y+s_y t_y\\ 1 \end{bmatrix} \end{aligned} \]

Remark: We can move the leaf by first scaling and then translating, or by first translating and then scaling, but we need different translation matrices!

Rotation by Homogeneous Transformation

We can also achieve rotation by homogeneous transformation: \[ R= \begin{bmatrix} R' & 0\\ 0 & 1 \end{bmatrix} \] where $R'= \begin{bmatrix} \cos\theta & -\sin\theta \\\sin\theta & \cos\theta \end{bmatrix} $ is a rotation matrix as we introduced before.

Rigid Transformation: Rotation + Translation

\[ p'=T\cdot R\cdot p \] \[ p'=T\cdot R\cdot p= \begin{bmatrix} I & t\\0 & 1 \end{bmatrix} \begin{bmatrix} R & 0 \\ 0 & 1 \end{bmatrix}\cdot p= {\color{red} \begin{bmatrix} R & t \\ 0 & 1 \end{bmatrix}}\cdot p \]

Using 2D rotation and 2D translation, we can freely move an object on a plane
without changing the shape and size

We call such transformations "rigid transformations".

3D Homogeneous Transformations

We assumed to transform shapes on a 2D plane before
Homogeneous transformations can also be used to transform shapes in 3D space:
- Homogeneous coordinate: $ \begin{bmatrix} x\\y\\z \end{bmatrix}\rightarrow \begin{bmatrix} x\\y\\z\\1 \end{bmatrix} $
- Homogeneous transformation matrix: \[ R= \begin{bmatrix} R' & 0 \\ 0 & 1 \end{bmatrix}\in\R{4\times 4},\qquad T= \begin{bmatrix} I & t \\ 0 & 1 \end{bmatrix}\in\R{4\times 4},\qquad S= \begin{bmatrix} S' & 0 \\ 0 & 1 \end{bmatrix}\in\R{4\times 4} \]

Use Matrix to Solve Linear Systems

Linear Systems

In computer vision, we often need to solve a system of linear equations
For example, we fit lines in images from corner points: \[ X=\{(x_1,y_1), (x_2, y_2), \ldots, (x_n, y_n)\} \]
By line equation, $y_i=k x_i + b\ \forall i$
We have the following equation system: \[ \left\{ \begin{aligned} & y_1=kx_1+b\\ & y_2=kx_2+b\\ & \ldots \\ & y_n=kx_n+b \end{aligned} \right. \]

Linear Systems

The system of linear equations can be rewritten using matrix form: \[ \left\{ \begin{aligned} y_1& =kx_1+b\\ y_2& =kx_2+b\\ & \cdots \\ y_n& =kx_n+b \end{aligned} \right. \Longrightarrow \underbrace{ \begin{bmatrix} x_1 & 1 \\ x_2 & 1 \\ \cdots & 1 \\ x_n & 1 \\ \end{bmatrix} }_{\text{coef.}} \underbrace{ \begin{bmatrix} k \\ b \end{bmatrix} }_{\text{unknown}} = \underbrace{ \begin{bmatrix} y_1 \\ y_2 \\ \cdots \\ y_n \end{bmatrix} }_{\text{bias}} \] Therefore, the line detection problem is a special case of solving the linear equation \[ \underbrace{A}_{\text{coef.}}\underbrace{x}_{\text{unknown}}=\underbrace{b}_{\text{bias}} \]

Linear Systems

\[ Ax=b \]

Ideally, we hope to solve $x$ by $x=A^{-1}b$, but this is often not possible!
We need additional concepts to find the answers to the following quesitons:
- Under what conditions do the solution exist?
- If we are not able to solve this equation, can we approximately solve it?

Invertible Matrix

Linear Independence

Suppose we have a set of non-zero vectors $V=\{v_1, \ldots, v_n\}$.
If we can express $v_1$ as a linear combination of the other vectors $v_2, \ldots, v_n$, then $v_1$ is linearly dependent on the other vectors
- For example, $v_1=0.7 v_2-0.7 v_4$
- If there is not such a vector that is linearly dependent on other vectors in $V$, the set $V$ is linearly independent
- Equivalently, \[ \sum_{i=1}^n \alpha_i v_i=0 \text{ if and only if } \alpha_i=0 \text { for all }i=1\ldots n \]
Common case: A set of vectors $v_1,\ldots,v_n$ is always linearly independent if each vector is perpendicular to every other vector (and non-zero)

Linear Independence

Linear independent set

Linear dependent set

Note: If there are more than 2 vectors on a 2D plane, they must be linear dependent.

Matrix Rank

Column/row rank \[ \begin{aligned} \text{col-rank(A)}&=\text{the maximum number of linearly independent column vectors of $A$}\\ \text{row-rank(A)}&=\text{the maximum number of linearly independent row vectors of $A$}\\ \end{aligned} \]
Column rank always equals row rank
Matrix rank: \[ \text{rank($A$):=col-rank($A$)=row-rank($A$)} \]

For transformation matrices, the rank tells you the dimensions of the output
For example, if rank of $A$ is 1, then the transformation \[ p'=Ap \] maps points onto a line.
$\begin{bmatrix} 1 & 1 \\ 2 & 2 \end{bmatrix}$ is a matrix with rank 1. Then \[ \begin{bmatrix} 1 & 1 \\ 2 & 2 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix}= \begin{bmatrix} x+y \\ 2x + 2y \end{bmatrix}= (x+y) \begin{bmatrix} 1\\2 \end{bmatrix} \]
Justification:
- rank 1: the maximum independent set by row vector is $\{[1, 1]\}$
- output dimension: output is a line containing any point along the vector $\begin{bmatrix}1\\2\end{bmatrix}$

Matrix Rank

If an $m\times m$ matrix has rank $m$, then we say that it is "full rank"
- Maps an $m\times 1$ vector uniquely to another $m\times 1$ vector
- An inverse matrix $A^{-1}$ can be found such that $A A^{-1}=A^{-1}A=I$
- For any $y=Ax$, we can use $x=A^{-1}y$ to identify the unique $x$ given $y$

Singular Matrix

When $A$ is not invertible, $A$ is not a 1-1 invertible map between the input and output.
We say that the matrix is "singular"
- Inverse does not exist
- No way to look at the result and tell what the input was
Let us study what $A$ does by relating the input space and the output space

Subspace

A subspace of a vector space is a subset of a vector space which is itself a vector space
Roughly speaking, a subspace is closed under vector addition and scalar multiplication
Examples of the subspace of $\R{n}$:
- $\{0\}$, where $0$ is an $n$-dim zero vector
- Given $v_i\in\R{n}$ for $i=1\ldots k$, define \[ \text{span}(v_1, v_2, \ldots, v_k)=\{\alpha_1v_1+\ldots+\alpha_kv_k|\alpha_i\in\R{}\}, \] then
  - $\text{span}(v)$ is a line in $\R{n}$
  - $\text{span}(v_1, v_2)$ is a 2D plane in $\R{n}$
  - $\text{span}(v_1, v_2, \ldots, v_k)$ is a k-D hyperplane in $\R{n}$

Basis and Dimension

A set of vectors $\{v_1, v_2,\ldots v_k\}$ is called a basis for a vector space $V$ if

$\{v_1, v_2, \ldots, v_k\}\text{ is independent}$
$V=\text{span}(v_1, v_2, \ldots, v_k)$

Every $v\in V$ can be uniquely expressed as \[ v=\alpha_1 v_1 + \ldots + \alpha_k v_k \]
For a given vector space $V$, the number of vectors in any basis is the same
The number of vectors in any basis is called the dimension of $V$, denoted $\textbf{dim}\ V$

Nullspace of a Matrix

The nullspace of $A\in \R{m\times n}$ is a subspace of $\R{n}$, defined as \[ \text{null}(A)=\{x\in\R{n}|Ax=0\} \]

$\text{null}(A)$ is the set of vectors mapped to $0$ by $y=Ax$
By definition, the inner product between $x$ and each row of $A$ is $0$, thus $x$ is orthogonal to the rows of $A$

$\text{null}(A)$ gives the ambiguity of solving $y=Ax$:

If $y=Ax$ and $z\in\text{null}(A)$, then $y=A(x+z)$
Conversely, if $y=Ax$ and $y=A\tilde{x}$, then $x-\tilde{x}=z$ for some $z\in\text{null}(A)$

Nullspace of a Matrix

Example 1: \[ A= \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} \] then $\text{null}(A)=\text{span}([1, -2, 1]^T)$
Example 2: $\text{null}(A)=\{0\}$ for $A\in\R{n\times n}$ if and only if $A$ is full-rank.

Range of Matrix

The range of $A\in\R{m\times n}$ is defined as \[ \text{range}(A)=\{Ax|x\in\R{n}\}\subseteq\R{m} \]
Using the column block representation of $A=[a_1, a_2, \ldots, a_n]$, \[ Ax=[a_1,a_2,\ldots,a_n] \begin{bmatrix} x_1\\x_2\\\vdots\\x_n \end{bmatrix}= \sum_{i=1}^n x_i a_i \]
$\text{range}(A)$ can be interpreted as
- the set of vectors that can be 'hit' by linear mapping $y=Ax$;
- the span of columns of $A$;
- the set of vectors $y$ for which $Ax=y$ has a solution.

The Map by $A$

Combining concepts we just defined, we can have a number of conclusions for the map \[ y=Ax \] where $A\in\R{m\times n}$:

$A$ maps the entire space of $\R{n}$ onto a subspace of $\R{m}$ (the range space);
If $\text{rank}(A)=r$, then the dimension of the range space is also $r$;
For any $z\in\text{null}(A)$, $Az=0$;
If $x$ is mapped to $y$, then $x+z$ is also mapped by $y$, where $z\in\text{null}(A)$.

Solving $Ax=b$

Now we have more tools to understand when we can solve $Ax=b$ ($A$ does not have to be square). We can conclude the following:

If $b\not\in \text{range}(A)$, we cannot exactly solve this equation.
If $b \in \text{range}(A)$,
- this equation has at least one solution $x$;
- for any $z\in\text{null}(A)$, $x+z$ is also a solution;
- If $\text{rank}(A)=n$
  $\Longrightarrow$ all the columns of $A$ are linearly independent
  $\Longrightarrow$ $\text{null}(A)=\{0\}$
  $\Longrightarrow$ the solution is unique.

Examples of Solving $Ax=b$

Consider the following equation system: \[ \begin{bmatrix} 1 & 2 & 3 \\ 2 & 4 & 6 \end{bmatrix} x= \begin{bmatrix} 1\\-1 \end{bmatrix} \] Because $ \begin{bmatrix} 1 \\ -1 \end{bmatrix}\not\in \text{range}\left( \begin{bmatrix} 1 & 2 & 3 \\ 2 & 4 & 6 \end{bmatrix} \right) $, the system does not have a solution.

Justification: the second row of $A$ is twice the first row. $Ax$ is the linear combination of columns of $A$, so the second row of $Ax$ must be twice the first row. But $b$ is not of this structure. So $b\not\in \text{range}(A)$.

Consider the following equation system: \[ \begin{bmatrix} 1 & 2 & 3 \\ 2 & 4 & 6 \end{bmatrix} x= \begin{bmatrix} 1\\2 \end{bmatrix} \] Because $ \begin{bmatrix} 1 \\ 2 \end{bmatrix}\in \text{range}\left( \begin{bmatrix} 1 & 2 & 3 \\ 2 & 4 & 6 \end{bmatrix} \right) $, the system has at least one solution.

A special solution is $x=[1, 0, 0]^T$;
$\text{rank}(A)=1$;
$\text{null}(A)=\text{span}( \begin{bmatrix} -2 \\ 1 \\ 0 \end{bmatrix}, \begin{bmatrix} -3 \\ 0 \\ 1 \end{bmatrix} )$
So the general solution is $ x= \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix}+\alpha_1 \begin{bmatrix} -2 \\ 1 \\ 0 \end{bmatrix}+\alpha_2 \begin{bmatrix} -3 \\ 0 \\ 1 \end{bmatrix}, \ \alpha_1, \alpha_2 \in \R{} $

Least Square Solution

When Exact Solution Does Not Exist

When $b\not\in \text{range}(A)$, we cannot solve $x$ exactly from $Ax=b$.
This is the commonest case in computer vision :(
In our line detection example, corner points are not exactly on a line, thus any line will have some errors

Least Square Formulation

But we may solve the system approximately through this optimization problem: \[ \begin{aligned} &\underset{x}{\text{minimize}}&&\|Ax-b\|^2 \end{aligned} \]
Explanation:
- Assume $A\in\R{m\times n}$, then $Ax=b$ is a system of $m$ linear equations
- Given some $x\in\R{n}$, $\epsilon=Ax-b\in\R{m}$ is the vector of error residuals for the $m$ equations
- We want to minimize the sum of squares of all error residuals, i.e., $\|\epsilon\|^2$.

Solving Least Square

Least square is an unconstrained optimization problem
Do you remember how to solve an unconstrained optimization problem from the calculus course?

\[ \begin{aligned} &\underset{x}{\text{minimize}}&& f(x) \end{aligned} \] where $f:\R{n}\to \R{}$

Solving Least Square

\[ \begin{aligned} &\underset{x}{\text{minimize}}&& f(x) \end{aligned} \] where $f:\R{n}\to \R{}$

Optimality condition (necessary): Gradient is zero
Given a function $f(x)$ where $x$ is a vector, the gradient at $x$ is denoted as $\nabla_x f(x)$.
The optimality condition is then \[ \nabla_x f(x)=0 \]

Prereq: Gradient of Vector Functions

Some basic rules of gradient computation:

$\nabla_x (f(x)+g(x))=\nabla_x f(x)+\nabla_x g(x)$
For $t\in\R{}$, $\nabla_x(t f(x))=t\nabla_x f(x)$

Prereq: Gradient of Linear Vector Functions

Consider a linear vector function: \[ f(x)=w^T x=\sum_i w_i x_i \] The gradient is: \[ \nabla_x f(x)=w \]

Prereq: Gradient of Quadratic Vector Functions

Consider a quadratic function $f(x)=x^T A x$ for $A$ being symmetric ($A=A^T$).

Remember that \[ f(x)=\sum_i \sum_j A_{ij}x_ix_j \]
If you take partial derivative, after calculation, we will have \[ \frac{\partial f(x)}{\partial x_k}=2\sum_{i=1}^n A_{ki} x_i \]

Prereq: Gradient of Quadratic Vector Functions

Consider a quadratic function $f(x)=x^T A x$ for $A$ being symmetric ($A=A^T$).

This result can be written compactly in matrix form:

\[ \nabla_x x^TAx=2Ax \]

Solving Least Square

Now we have all the preparations to solve the least square problem: \[ \begin{aligned} &\underset{x}{\text{minimize}}&&\|Ax-b\|^2 \end{aligned} \] First, rewrite the objective as: \[ \begin{aligned} f(x)=\|Ax-b\|^2&=(Ax-b)^T(Ax-b)=(x^TA^T-b^T)(Ax-b)\\ &=x^TA^TAx-2 (b^TA)x+b^Tb \end{aligned} \] Using the gradient rules of linear and quadratic functions, \[ \nabla_x f(x)=2A^TAx-2A^Tb \]

Solving Least Square

Now we have all the preparations to solve the least square problem: \[ \begin{aligned} &\underset{x}{\text{minimize}}&&\|Ax-b\|^2 \end{aligned} \] Setting gradient to zero, \[ A^TAx=A^Tb \] The solution to the problem is (assume $A^TA$ is invertible): \[ x=(A^TA)^{-1}A^Tb \]

Summary of Solving Least Square

We have derived the solution for least square problem: \[ \begin{aligned} &\underset{x}{\text{minimize}}&&\|Ax-b\|^2 \end{aligned} \] The solution to the problem is (assume $A^TA$ is invertible): \[ x=(A^TA)^{-1}A^Tb \]

Pseudo Inverse

Summary of Solving $Ax=b$ for All Cases

We have discussed the solution of $Ax=b$ for all possible cases:

$A$ is invertible: there is a unique solution
$b\in \text{range}(A)$ and $\text{null}(A)\neq \{0\}$: there is a family of solutions (we need a special solution to find all)
$b\not\in \text{range}(A)$: there is a least-square approximate solution

Pseudo Inverse

In linear algebra, there is a matrix called the pseudo-inverse for every $A$ regardless shape, denoted as $A^{\dagger}$, so that \[ x^*=A^{\dagger} b \] can solve $Ax=b$ in the sense that:

when $A$ is invertible, $x^*$ is the unique solution
when $b\in \text{range}(A)$ and $\text{null}(A)\neq \{0\}$, $x^*$ is a special solution (in fact, the solution with the least $\|x\|)$
when $b\not\in \text{range}(A)$, $x^*$ is an approximate solutiona by least square

Numericallly, this can be computed by the Singular Value Decomposition algorithm (SVD), which is beyond the scope of our course.

Code Example

Invertible $A$


A = np.random.rand(3, 3)
b = np.random.rand(3, 1)

x = np.linalg.pinv(A) @ b
print('A=', A)
print('b=', b)
print('x=', x)
print('Ax-b', A @ x - b)

# A= [[6.97446218e-01 3.47415656e-01 5.93783190e-01]
#  [3.20606804e-01 2.51155615e-04 8.31997605e-01]
#  [4.16578041e-01 4.96157978e-01 4.90465357e-01]]
# b= [[0.59153536]
#  [0.92799786]
#  [0.51953575]]
# x= [[-0.16647919]
#  [ 0.02089704]
#  [ 1.179531  ]]
# Ax-b [[3.33066907e-16]
#  [1.11022302e-16]
#  [3.33066907e-16]]

Fat $A$
Solution exists but not unique


A = np.random.rand(2, 3)
b = np.random.rand(2, 1)

x = np.linalg.pinv(A) @ b
print('A=', A)
print('b=', b)
print('x=', x)
print('Ax-b', A @ x - b)

# A= [[0.05871486 0.033507   0.94260723]
#  [0.69295901 0.84740045 0.74196673]]
# b= [[0.63610879]
#  [0.07260413]]
# x= [[-0.23861937]
#  [-0.33346011]
#  [ 0.70155682]]
# Ax-b [[4.44089210e-16]
#  [2.77555756e-16]]

Tall $A$
Least square solution


A = np.random.rand(3, 2)
b = np.random.rand(3, 1)

x = np.linalg.pinv(A) @ b
print('A=', A)
print('b=', b)
print('x=', x)
print('Ax-b', A @ x - b)

# A= [[0.26404115 0.29162541]
#  [0.1664837  0.07120312]
#  [0.87827123 0.31327209]]
# b= [[0.99762258]
#  [0.54386582]
#  [0.86558511]]
# x= [[-0.28797735]
#  [ 3.74833987]]
# Ax-b [[ 0.01945071]
#  [-0.32491585]
#  [ 0.05574292]]

End