Your browser doesn't support the features required by impress.js, so you are presented with a simplified version of this presentation.

For the best experience please use the latest Chrome, Safari or Firefox browser.

\[ \newcommand{\R}[1]{\mathbb{R}^{#1}} \]

Linear Algebra

Hao Su

Winter, 2022

Note: the slides are based on CSE131 (Juan Carlos et al.) and EE263 (by Stephen Boyd et al.) at Stanford.

Agenda

    click to jump to the section.

    Course Logistics

    Exams and Assignments

    Exams (20%)
    • A final exam
    • No in-class exams
    Homework (80%)
    • Programming and Q&A based problems
    • Weekly homeworks without late days
    • You can drop 1 assignment without penalty
    • Expected time to solve all problems in each homework: $\sim 5$ hours
    • Each homework solicits your feedback on teaching (1 free credit)
    Some extra credits

    Collaboration Policy

    • You can work individually or form study groups of any number of members
    • For study groups,
      • each individual must create and submit your own solution of all problems
      • please clearly indicate the name of your collaborator and the problems you have discussed

    Linear Algebra in Computer Vision

    Matrix

    • A matrix $A$ is an array of numbers with size $m$ by $n$, i.e., $m$ rows and $n$ columns \[ A= \begin{bmatrix} a_{11} & a_{12} & a_{13} & \ldots & a_{1n} \\ a_{21} & a_{22} & a_{23} & \ldots & a_{2n} \\ \vdots & & & & \vdots \\ a_{m1} & a_{m2} & a_{m3} & \ldots & a_{mn} \end{bmatrix} \]
    • If $m=n$, we say that $A$ is square.
    • To denote the shape of $A$, we often write $A\in\R{m\times n}$

    Images

    • Python represents an image as a matrix of pixel brightness
    • Note that the upper left corner is $(y,x)=[0,0]$

    Color Images

    • Grayscale images have one number per pixel, and are stored as an $m\times n$ matrix
    • Color images have 3 numbers per pixel — red, green, and blue brightness (RGB)
    • Stored as an $m\times n\times 3$ matrix (height * width * channel)
    • Note: For matrices with channels, it is also called a "tensor"

    Code Example: Load an Image

    
    from PIL import Image # a library for image IO
    import numpy as np # a library for linear algebra
    import torch # a library for deep learning
    
    file_path = "./cvpr.png"
    im = Image.open(file_path)
    
    # display the image on screen
    im.show() 
    
    # convert the image to numpy format and then to torch format, and normalize pixel value scale to [0, 1]
    image_tensor = torch.FloatTensor(np.array(im) / 255)
    
    # print the size of the matrix
    print(image_tensor.shape) 
    # out: 
    # torch.Size([853, 1259, 3])
    
    # print the upper-left corner of the first channel of the image
    print(image_tensor[:3, :3, 0])
    # out: 
    # tensor([[0.4510, 0.4471, 0.4471],
    #         [0.4549, 0.4549, 0.4549],
    #         [0.4549, 0.4549, 0.4549]])
                                

    Vector

    • When the number of column of a matrix is 1, it is called a vector: \[ v= \begin{bmatrix} v_1\\v_2\\\vdots\\v_n \end{bmatrix} \]
    • We often denote the shape of $v$ as $v\in\R{n}$
    • A row vector $v^T\in\R{1\times n}$ where \[ v^T=[v_1, v_2, \ldots, v_n] \] where $T$ denotes the transpose operation

    Example: Image Embedding

    Place images on the plane so that
    clothes with similar styles are closer \[ \text{position}(I)= \begin{bmatrix} x\\ y \end{bmatrix} \] $\text{position}(\cdot)$ is a function of matrix
    http://proc-x.com/2017/09/a-vgg-like-cnn-for-fashion-mnist-with-94-accuracy/

    Image Understanding as Linear Algebra

    • Matrix is the natural representation of images
    • In linear algebra, we learned a lot of knowledge about matrices
    • So we need a solid grasp of linear algebra to understand images

    Basic Roles of Matrices and Vectors in CV

    • To represent images (we have shown examples)
    • To represent geometries, e.g.,
      • 3D vectors are natural representations of points in phyiscal space
      • The quadratic equation of ellipses can be represented as $x^TAx=1$
    • To build transformations to operate on images or geometries, e.g.,
      • Build image filters using matrices to sharpen or smooth images
      • Deep learning uses matrix to learn feature extractors
    Next, we review vectors and matrices as
    data representation and transformations

    Basic Vector and Matrix Operations

    Vector Addition and Scaling

    • Addition \[ \begin{bmatrix} a \\ b \end{bmatrix} + \begin{bmatrix} 1 \\ 2 \end{bmatrix} = \begin{bmatrix} a+1 \\ b+2 \end{bmatrix} \]
    • Scaling \[ \begin{bmatrix} a \\ b \end{bmatrix}\times 3 = \begin{bmatrix} 3a \\ 3b \end{bmatrix} \]

    Geometry of Vector Addition

    • A 2D vector is an arrow in 2D plane
    • A 3D vector is an arrow in 3D space
    • Vector addition composes vectors

    Dot Product (a.k.a. Inner Product)

    • Multiply corresponding entries of two vectors and add up the result: \[ x^T y=[x_1\ldots x_n] \begin{bmatrix} y_1\\\vdots\\y_n \end{bmatrix}= \sum_{i=1}^n x_iy_i \qquad \text{(scalar)} \]

    Dot Product (a.k.a. Inner Product)

    • $x^Ty$ is also $|x||y|\cos(\text{the angle between $x$ and $y$})$
    • If $y$ is a unit vector, then $x^Ty$ gives the length of $x$ which lies in the direction of $y$.
    • Q: if $y$ is not a unit vector, how to compute the length of $x$ projected on $y$?

    Vector Norm

    • Vector norm is a notion of vector length
    • The most common norm is the $\ell_2$ norm: \[ \|x\|_2=\sqrt{\sum_{i=1}^n x_i^2} \]
    • $\ell_2$ norm can be represented by inner product: \[ \|x\|_2=\sqrt{x^T x} \]
    • We can compute distance between points by norms: \[ \text{dist}(x, y)=\|x-y\|_2 \]

    Matrix Addition and Scaling

    Similar to vectors, matrix can also be added or scaled:
    • Addition \[ \begin{bmatrix} a & b\\ c & d \end{bmatrix} + \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} = \begin{bmatrix} a+1 & b+2 \\ c+3 & d+4 \end{bmatrix} \]
    • Scaling \[ \begin{bmatrix} a & b \\ c & d \end{bmatrix}\times 3 = \begin{bmatrix} 3a & 3b \\ 3c & 3d \end{bmatrix} \]

    Matrix-Vector Product

    But the most important usage of matrix is that, matrix can tranform vectors.

    Matrix-Vector Product

    • Assume \( A= \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \cdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \\ \end{bmatrix}= \begin{bmatrix} a_1^T \\ a_2^T \\ \cdots \\ a_m^T \end{bmatrix} \in \R{m\times n}, \ x= \begin{bmatrix} x_1 \\ x_2 \\ \cdots \\ x_n \end{bmatrix} \)
    • Then \( y=Ax= \begin{bmatrix} a_1^T x\\ a_2^T x\\ \cdots \\ a_m^T x\\ \end{bmatrix} \)

    Scaling

    • Consider a 2D vector $\begin{bmatrix}x\\y\end{bmatrix}$ on the plane
    • We can use a $2\times 2$ diagonal matrix to multiply it: \[ \begin{bmatrix} s_x & 0 \\ 0 & s_y \end{bmatrix} \cdot \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} s_x x\\ s_y y \end{bmatrix} \] (verify by yourself that matrix multiplication works out this way)
    • Denote $p=[x,y]^T$, $p'=[x',y']^T$ and \[ S= \begin{bmatrix} s_x & 0\\ 0 & s_y \end{bmatrix} \]
    • $p'=Sp$

    Rotation

    Counter-clockwise rotation by an angle $\theta$
    • The formula is: \[ \begin{bmatrix} x' \\ y' \end{bmatrix}= \begin{bmatrix} \cos\theta & -\sin\theta\\ \sin\theta & \cos\theta \end{bmatrix} \begin{bmatrix} x\\y \end{bmatrix} \]
    • Denote \[ R= \begin{bmatrix} \cos\theta & -\sin\theta\\ \sin\theta & \cos\theta \end{bmatrix} \]
    • $p'=Rp$

    Composition of Transformations

    • Multiple transformation matrices can be used to transform a point: \[ p'=R_2R_1S p \]
    • The effect of this is to apply their transformations one after the other, from right to left
    • In the example above, the result is $$(R_2(R_1(Sp)))$$
    • We can use the rules of matrix multiplication to compute from left to right $$p'=(R_2R_1S)p$$

    Matrix Product

    • The product of two matrices
    • \[A\in\mathbb{R}^{m\times n}, B\in\mathbb{R}^{n\times p}\] \[C=AB=\begin{bmatrix} - a_1^T -\\ - a_2^T -\\ \vdots\\ - a_m^T - \end{bmatrix} \begin{bmatrix} | & | & & |\\ b_1 & b_2 & \cdots & b_p\\ | & | & & | \end{bmatrix} = \begin{bmatrix} a_1^Tb_1 & a_1^Tb_2 & \cdots & a_1^T b_p\\ a_2^Tb_1 & a_2^Tb_2 & \cdots & a_2^T b_p\\ \vdots & \vdots & \ddots &\vdots \\ a_m^T b_1 & a_m^T b_2 & \cdots & a_m^Tb_p \end{bmatrix} \]
    • $C=AB\in\mathbb{R}^{m\times p}$
    • $C_{ij}=\sum_{i=1}^n A_{ik} B_{kj}$
    • Block matrix: $AB=A[b_1, b_2, …, b_p]=[Ab_1, Ab_2, …, Ab_p]$

    Matrix Product Rules

    • Association: $(AB)C=A(BC)$
    • Distributive: $A(B+C)=AB+AC$
    • Cautious: matrix multiplication is generally NOT commutative: $AB\neq BA$
      • For example, if $A\in\R{m\times n}$ and $B\in\R{n\times q}$, then matrix product $BA$ does not even exist if $m\neq q$!

    Matrix Product Rules

    • Transpose: flip matrix so that rows become columns, and columns become rows
    • For example: \[ \begin{bmatrix} 0 & 1\\2 & 3\\4 & 5 \end{bmatrix}^T= \begin{bmatrix} 0 & 2 & 4 \\ 1 & 3 & 5 \end{bmatrix} \]
    • A useful identity: \[ (ABC)^T=C^TB^TA^T \]

    Special Matrices

    • Identity matrix $I$: \[ I_{3\times 3}= \begin{bmatrix} 1 & 0 & 0\\0 & 1 & 0\\ 0 & 0 & 1 \end{bmatrix} \] When applied on any vectors, the vector does not change: $I x= x$.
    • Diagonal matrix. For example: \[ \begin{bmatrix} 3 & 0 & 0 \\ 0 & 7 & 0 \\ 0 & 0 & 2.5 \end{bmatrix} \] When applied on a vector, the dimensions scale according to diagonal elements.

    Special Matrices

    • Symmetric matrix: $A^T=A$ \[ \begin{bmatrix} 1 & 2 & 5\\ 2 & 1 & 7\\ 5 & 7 & 1 \end{bmatrix} \] We will see its usage in subsequent lectures.
    • Orthonormal matrix: $A A^T=A^TA=I$
      • Verify that rotation matrix $R$ is an orthonormal matrix

    2D and 3D Homogeneous Transformation

    Limitation of Linear Transformation

    • We learned about scaling and rotation
    • In general, a matrix multiplication lets us linearly combine components of a vector $x$ \[ \begin{bmatrix} a & b\\ c & d \end{bmatrix}\cdot \begin{bmatrix} x_1\\x_2 \end{bmatrix}= \begin{bmatrix} ax_1+bx_2\\ cx_1+dx_2 \end{bmatrix} \]
      • This is sufficient for scale, rotate, skew transformations
      • But notice, the output components do not include a constant term! :(

    Limitation of Linear Transformation

    • Geometrically, the origin $[0,0]$ is always transformed to $[0,0]$.
    • Question:
      • can we translate the vertices of a square by a matrix?
      • No, we cannot! Because the origin point does not move by a linear transformation.

    Homogeneous Transformation

    • Homogeneous coordinate:
      • Stick a "1" at the end of every vector in Euclidean space: \( \begin{bmatrix} x \\ y \end{bmatrix} \Rightarrow \begin{bmatrix} x \\ y \\ 1 \end{bmatrix} \)
      • Use a 3D vector to represent a 2D point
    • We can use a $3\times 3$ matrix with a special structure to transform the 3D-version of the 2D point: \[ \begin{bmatrix} a & b & c\\ d & e & f \\ 0 & 0 & 1 \end{bmatrix}\cdot \begin{bmatrix} x \\ y \\ 1 \end{bmatrix}= \begin{bmatrix} ax + by + c\\dx + ey + f \\ 1 \end{bmatrix} \]

    Homogeneous Transformation

    \[ \begin{bmatrix} a & b & c\\ d & e & f \\ 0 & 0 & 1 \end{bmatrix}\cdot \begin{bmatrix} x \\ y \\ 1 \end{bmatrix}= \begin{bmatrix} ax + by + c\\dx + ey + f \\ 1 \end{bmatrix} \]
    • Note that the last element of the transformed point is always 1
    • When decoding the position of the homogeneous coordinate, we discard the last element 1
    • The augmented $3\times 3$ matrix can achieve more than the $2\times 2$ matrix
      • Check that the origin point $[0,0]^T$ is moved to $[c,f]^T$
    • Now we can rotate, scale, and skew like before, AND translate

    Translation

    Translation by Homogeneous Transformation

    To move $p=(x,y)$ to $p'=(x+t_x, y+t_y)$, we take the following steps:
    1. Write down the homogeneous coordinate: \[ p=(x,y)\rightarrow (x,y,1) \]
    2. Build the homogeneous transformation matrix: \[ T= \begin{bmatrix} 1 & 0 & t_x\\0 & 1 & t_y\\0 & 0 & 1 \end{bmatrix}. \quad \text{In block matrix form, written as } T= \begin{bmatrix} I & t\\0 & 1 \end{bmatrix} \]

    Translation by Homogeneous Transformation

    To move $p=(x,y)$ to $p'=(x+t_x, y+t_y)$, we take the following steps:
    Move the point by homogeneous transformation: \[ \begin{bmatrix} x+t_x\\y+t_y\\1 \end{bmatrix}= \begin{bmatrix} 1 & 0 & t_x\\0 & 1 & t_y\\0 & 0 & 1 \end{bmatrix}\cdot \begin{bmatrix} x\\y\\1 \end{bmatrix} \]

    Scaling

    Scaling by Homogeneous Transformation

    To move $p=(x,y)$ to $p'=(s_x x, s_y y)$, we take the following steps:
    1. Write down the homogeneous coordinate: \[ p=(x,y)\rightarrow (x,y,1) \]
    2. Build the homogeneous scaling matrix: \[ S= \begin{bmatrix} s_x & 0 & 0\\0 & s_y & 0\\0 & 0 & 1 \end{bmatrix}. \quad \text{In block matrix form, written as } S= \begin{bmatrix} S' & 0\\0 & 1 \end{bmatrix} \]

    Scaling by Homogeneous Transformation

    To move $p=(x,y)$ to $p'=(s_x x, s_y y)$, we take the following steps:
    Move the point by homogeneous transformation: \[ \begin{bmatrix} s_x x\\s_y y\\1 \end{bmatrix}= \begin{bmatrix} s_x & 0 & 0\\0 & s_y & 0\\0 & 0 & 1 \end{bmatrix}\cdot \begin{bmatrix} x\\y\\1 \end{bmatrix} \]

    Scaling and Translation

    Scaling and Translation

    • We can first scale the leaf and then translate the leaf:
      1. $p'=S\cdot p$
      2. $p''=T\cdot p'$
    • Compose the operations together, we can do: \[ p''=T\cdot p'=T\cdot (S\cdot p)=T\cdot S\cdot p \]

    Scaling and Translation

    \begin{align*} & p''=T\cdot S\cdot p= \begin{bmatrix} 1 & 0 & t_x\\ 0 & 1 & t_y\\ 0 & 0 & 1 \end{bmatrix}\cdot \begin{bmatrix} s_x & 0 & 0 \\ 0 & s_y & 0\\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} x \\ y \\ 1 \end{bmatrix}=\\ & = {\color{red} \begin{bmatrix} s_x & 0 & t_x\\ 0 & s_y & t_y \\ 0 & 0 & 1 \end{bmatrix}} \begin{bmatrix} x \\ y \\ 1 \end{bmatrix}= \begin{bmatrix} s_x x+t_x\\ s_y y +t_y \\ 1 \end{bmatrix}= \begin{bmatrix} S & t\\ 0 & 1 \end{bmatrix} \begin{bmatrix} x \\ y \\ 1 \end{bmatrix} \end{align*}

    Scaling & Translation $\neq$ Translation & Scaling

    \[ \begin{aligned} T\cdot S&= \begin{bmatrix} s_x & 0 & t_x\\0 & s_y & t_y\\ 0 & 0 & 1 \end{bmatrix} &\Rightarrow p''&= \begin{bmatrix} s_x x + t_x\\s_y y + t_y \\ 1 \end{bmatrix} \\ S\cdot T&= \begin{bmatrix} s_x & 0 & s_x t_x\\0 & s_y & s_yt_y\\ 0 & 0 & 1 \end{bmatrix} &\Rightarrow p''&= \begin{bmatrix} s_x x + s_x t_x\\s_y y+s_y t_y\\ 1 \end{bmatrix} \end{aligned} \]
    • Remark: We can move the leaf by first scaling and then translating, or by first translating and then scaling, but we need different translation matrices!

    Rotation by Homogeneous Transformation

    • We can also achieve rotation by homogeneous transformation: \[ R= \begin{bmatrix} R' & 0\\ 0 & 1 \end{bmatrix} \] where $R'= \begin{bmatrix} \cos\theta & -\sin\theta \\\sin\theta & \cos\theta \end{bmatrix} $ is a rotation matrix as we introduced before.

    Rigid Transformation: Rotation + Translation

    \[ p'=T\cdot R\cdot p \] \[ p'=T\cdot R\cdot p= \begin{bmatrix} I & t\\0 & 1 \end{bmatrix} \begin{bmatrix} R & 0 \\ 0 & 1 \end{bmatrix}\cdot p= {\color{red} \begin{bmatrix} R & t \\ 0 & 1 \end{bmatrix}}\cdot p \]
    Using 2D rotation and 2D translation, we can freely move an object on a plane
    without changing the shape and size
    We call such transformations "rigid transformations".

    3D Homogeneous Transformations

    • We assumed to transform shapes on a 2D plane before
    • Homogeneous transformations can also be used to transform shapes in 3D space:
      • Homogeneous coordinate: \( \begin{bmatrix} x\\y\\z \end{bmatrix}\rightarrow \begin{bmatrix} x\\y\\z\\1 \end{bmatrix} \)
      • Homogeneous transformation matrix: \[ R= \begin{bmatrix} R' & 0 \\ 0 & 1 \end{bmatrix}\in\R{4\times 4},\qquad T= \begin{bmatrix} I & t \\ 0 & 1 \end{bmatrix}\in\R{4\times 4},\qquad S= \begin{bmatrix} S' & 0 \\ 0 & 1 \end{bmatrix}\in\R{4\times 4} \]

    Use Matrix to Solve Linear Systems

    Linear Systems

    • In computer vision, we often need to solve a system of linear equations
    • For example, we fit lines in images from corner points: \[ X=\{(x_1,y_1), (x_2, y_2), \ldots, (x_n, y_n)\} \]
    • By line equation, $y_i=k x_i + b\ \forall i$
    • We have the following equation system: \[ \left\{ \begin{aligned} & y_1=kx_1+b\\ & y_2=kx_2+b\\ & \ldots \\ & y_n=kx_n+b \end{aligned} \right. \]

    Linear Systems

    The system of linear equations can be rewritten using matrix form: \[ \left\{ \begin{aligned} y_1& =kx_1+b\\ y_2& =kx_2+b\\ & \cdots \\ y_n& =kx_n+b \end{aligned} \right. \Longrightarrow \underbrace{ \begin{bmatrix} x_1 & 1 \\ x_2 & 1 \\ \cdots & 1 \\ x_n & 1 \\ \end{bmatrix} }_{\text{coef.}} \underbrace{ \begin{bmatrix} k \\ b \end{bmatrix} }_{\text{unknown}} = \underbrace{ \begin{bmatrix} y_1 \\ y_2 \\ \cdots \\ y_n \end{bmatrix} }_{\text{bias}} \] Therefore, the line detection problem is a special case of solving the linear equation \[ \underbrace{A}_{\text{coef.}}\underbrace{x}_{\text{unknown}}=\underbrace{b}_{\text{bias}} \]

    Linear Systems

    \[ Ax=b \]
    • Ideally, we hope to solve $x$ by $x=A^{-1}b$, but this is often not possible!
    • We need additional concepts to find the answers to the following quesitons:
      • Under what conditions do the solution exist?
        • Is the solution unique?
        • If not, how can we find all the solutions?
      • If we are not able to solve this equation, can we approximately solve it?

    Invertible Matrix

    Linear Independence

    • Suppose we have a set of non-zero vectors $V=\{v_1, \ldots, v_n\}$.
    • If we can express $v_1$ as a linear combination of the other vectors $v_2, \ldots, v_n$, then $v_1$ is linearly dependent on the other vectors
      • For example, $v_1=0.7 v_2-0.7 v_4$
      • If there is not such a vector that is linearly dependent on other vectors in $V$, the set $V$ is linearly independent
      • Equivalently, \[ \sum_{i=1}^n \alpha_i v_i=0 \text{ if and only if } \alpha_i=0 \text { for all }i=1\ldots n \]
    • Common case: A set of vectors $v_1,\ldots,v_n$ is always linearly independent if each vector is perpendicular to every other vector (and non-zero)

    Linear Independence

    Linear independent set
    Linear dependent set
    • Note: If there are more than 2 vectors on a 2D plane, they must be linear dependent.

    Matrix Rank

    • Column/row rank \[ \begin{aligned} \text{col-rank(A)}&=\text{the maximum number of linearly independent column vectors of $A$}\\ \text{row-rank(A)}&=\text{the maximum number of linearly independent row vectors of $A$}\\ \end{aligned} \]
    • Column rank always equals row rank
    • Matrix rank: \[ \text{rank($A$):=col-rank($A$)=row-rank($A$)} \]
    • For transformation matrices, the rank tells you the dimensions of the output
    • For example, if rank of $A$ is 1, then the transformation \[ p'=Ap \] maps points onto a line.
    • $\begin{bmatrix} 1 & 1 \\ 2 & 2 \end{bmatrix}$ is a matrix with rank 1. Then \[ \begin{bmatrix} 1 & 1 \\ 2 & 2 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix}= \begin{bmatrix} x+y \\ 2x + 2y \end{bmatrix}= (x+y) \begin{bmatrix} 1\\2 \end{bmatrix} \]
    • Justification:
      • rank 1: the maximum independent set by row vector is $\{[1, 1]\}$
      • output dimension: output is a line containing any point along the vector $\begin{bmatrix}1\\2\end{bmatrix}$

    Matrix Rank

    • If an $m\times m$ matrix has rank $m$, then we say that it is "full rank"
      • Maps an $m\times 1$ vector uniquely to another $m\times 1$ vector
      • An inverse matrix $A^{-1}$ can be found such that $A A^{-1}=A^{-1}A=I$
      • For any $y=Ax$, we can use $x=A^{-1}y$ to identify the unique $x$ given $y$

    Singular Matrix

    • When $A$ is not invertible, $A$ is not a 1-1 invertible map between the input and output.
    • We say that the matrix is "singular"
      • Inverse does not exist
      • No way to look at the result and tell what the input was
    • Let us study what $A$ does by relating the input space and the output space

    Subspace

    Subspace

    • A subspace of a vector space is a subset of a vector space which is itself a vector space
    • Roughly speaking, a subspace is closed under vector addition and scalar multiplication
    • Examples of the subspace of $\R{n}$:
      • $\{0\}$, where $0$ is an $n$-dim zero vector
      • Given $v_i\in\R{n}$ for $i=1\ldots k$, define \[ \text{span}(v_1, v_2, \ldots, v_k)=\{\alpha_1v_1+\ldots+\alpha_kv_k|\alpha_i\in\R{}\}, \] then
        • $\text{span}(v)$ is a line in $\R{n}$
        • $\text{span}(v_1, v_2)$ is a 2D plane in $\R{n}$
        • $\text{span}(v_1, v_2, \ldots, v_k)$ is a k-D hyperplane in $\R{n}$

    Basis and Dimension

    A set of vectors $\{v_1, v_2,\ldots v_k\}$ is called a basis for a vector space $V$ if
    • $\{v_1, v_2, \ldots, v_k\}\text{ is independent}$
    • $V=\text{span}(v_1, v_2, \ldots, v_k)$
    • Every $v\in V$ can be uniquely expressed as \[ v=\alpha_1 v_1 + \ldots + \alpha_k v_k \]
    • For a given vector space $V$, the number of vectors in any basis is the same
    • The number of vectors in any basis is called the dimension of $V$, denoted $\textbf{dim}\ V$

    Nullspace of a Matrix

    The nullspace of $A\in \R{m\times n}$ is a subspace of $\R{n}$, defined as \[ \text{null}(A)=\{x\in\R{n}|Ax=0\} \]
    • $\text{null}(A)$ is the set of vectors mapped to $0$ by $y=Ax$
    • By definition, the inner product between $x$ and each row of $A$ is $0$, thus $x$ is orthogonal to the rows of $A$
    $\text{null}(A)$ gives the ambiguity of solving $y=Ax$:
    • If $y=Ax$ and $z\in\text{null}(A)$, then $y=A(x+z)$
    • Conversely, if $y=Ax$ and $y=A\tilde{x}$, then $x-\tilde{x}=z$ for some $z\in\text{null}(A)$

    Nullspace of a Matrix

    • Example 1: \[ A= \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} \] then $\text{null}(A)=\text{span}([1, -2, 1]^T)$
    • Example 2: $\text{null}(A)=\{0\}$ for $A\in\R{n\times n}$ if and only if $A$ is full-rank.

    Range of Matrix

    • The range of $A\in\R{m\times n}$ is defined as \[ \text{range}(A)=\{Ax|x\in\R{n}\}\subseteq\R{m} \]
    • Using the column block representation of $A=[a_1, a_2, \ldots, a_n]$, \[ Ax=[a_1,a_2,\ldots,a_n] \begin{bmatrix} x_1\\x_2\\\vdots\\x_n \end{bmatrix}= \sum_{i=1}^n x_i a_i \]
    • $\text{range}(A)$ can be interpreted as
      • the set of vectors that can be 'hit' by linear mapping $y=Ax$;
      • the span of columns of $A$;
      • the set of vectors $y$ for which $Ax=y$ has a solution.

    The Map by $A$

    Combining concepts we just defined, we can have a number of conclusions for the map \[ y=Ax \] where $A\in\R{m\times n}$:
    • $A$ maps the entire space of $\R{n}$ onto a subspace of $\R{m}$ (the range space);
    • If $\text{rank}(A)=r$, then the dimension of the range space is also $r$;
    • For any $z\in\text{null}(A)$, $Az=0$;
    • If $x$ is mapped to $y$, then $x+z$ is also mapped by $y$, where $z\in\text{null}(A)$.

    Solving $Ax=b$

    Now we have more tools to understand when we can solve $Ax=b$ ($A$ does not have to be square). We can conclude the following:
    • If $b\not\in \text{range}(A)$, we cannot exactly solve this equation.
    • If $b \in \text{range}(A)$,
      • this equation has at least one solution $x$;
      • for any $z\in\text{null}(A)$, $x+z$ is also a solution;
      • If $\text{rank}(A)=n$
        $\Longrightarrow$ all the columns of $A$ are linearly independent
        $\Longrightarrow$ $\text{null}(A)=\{0\}$
        $\Longrightarrow$ the solution is unique.

    Examples of Solving $Ax=b$

    Consider the following equation system: \[ \begin{bmatrix} 1 & 2 & 3 \\ 2 & 4 & 6 \end{bmatrix} x= \begin{bmatrix} 1\\-1 \end{bmatrix} \] Because \( \begin{bmatrix} 1 \\ -1 \end{bmatrix}\not\in \text{range}\left( \begin{bmatrix} 1 & 2 & 3 \\ 2 & 4 & 6 \end{bmatrix} \right) \), the system does not have a solution.
    Justification: the second row of $A$ is twice the first row. $Ax$ is the linear combination of columns of $A$, so the second row of $Ax$ must be twice the first row. But $b$ is not of this structure. So $b\not\in \text{range}(A)$.
    Consider the following equation system: \[ \begin{bmatrix} 1 & 2 & 3 \\ 2 & 4 & 6 \end{bmatrix} x= \begin{bmatrix} 1\\2 \end{bmatrix} \] Because \( \begin{bmatrix} 1 \\ 2 \end{bmatrix}\in \text{range}\left( \begin{bmatrix} 1 & 2 & 3 \\ 2 & 4 & 6 \end{bmatrix} \right) \), the system has at least one solution.
    • A special solution is $x=[1, 0, 0]^T$;
    • $\text{rank}(A)=1$;
    • $\text{null}(A)=\text{span}( \begin{bmatrix} -2 \\ 1 \\ 0 \end{bmatrix}, \begin{bmatrix} -3 \\ 0 \\ 1 \end{bmatrix} )$
    • So the general solution is \( x= \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix}+\alpha_1 \begin{bmatrix} -2 \\ 1 \\ 0 \end{bmatrix}+\alpha_2 \begin{bmatrix} -3 \\ 0 \\ 1 \end{bmatrix}, \ \alpha_1, \alpha_2 \in \R{} \)

    Least Square Solution

    When Exact Solution Does Not Exist

    • When $b\not\in \text{range}(A)$, we cannot solve $x$ exactly from $Ax=b$.
    • This is the commonest case in computer vision :(
    • In our line detection example, corner points are not exactly on a line, thus any line will have some errors

    Least Square Formulation

    • But we may solve the system approximately through this optimization problem: \[ \begin{aligned} &\underset{x}{\text{minimize}}&&\|Ax-b\|^2 \end{aligned} \]
    • Explanation:
      • Assume $A\in\R{m\times n}$, then $Ax=b$ is a system of $m$ linear equations
      • Given some $x\in\R{n}$, $\epsilon=Ax-b\in\R{m}$ is the vector of error residuals for the $m$ equations
      • We want to minimize the sum of squares of all error residuals, i.e., $\|\epsilon\|^2$.

    Solving Least Square

    • Least square is an unconstrained optimization problem
    • Do you remember how to solve an unconstrained optimization problem from the calculus course?
    \[ \begin{aligned} &\underset{x}{\text{minimize}}&& f(x) \end{aligned} \] where $f:\R{n}\to \R{}$

    Solving Least Square

    \[ \begin{aligned} &\underset{x}{\text{minimize}}&& f(x) \end{aligned} \] where $f:\R{n}\to \R{}$
    • Optimality condition (necessary): Gradient is zero
    • Given a function $f(x)$ where $x$ is a vector, the gradient at $x$ is denoted as $\nabla_x f(x)$.
    • The optimality condition is then \[ \nabla_x f(x)=0 \]

    Prereq: Gradient of Vector Functions

    Some basic rules of gradient computation:
    • $\nabla_x (f(x)+g(x))=\nabla_x f(x)+\nabla_x g(x)$
    • For $t\in\R{}$, $\nabla_x(t f(x))=t\nabla_x f(x)$

    Prereq: Gradient of Linear Vector Functions

    Consider a linear vector function: \[ f(x)=w^T x=\sum_i w_i x_i \] The gradient is: \[ \nabla_x f(x)=w \]

    Prereq: Gradient of Quadratic Vector Functions

    Consider a quadratic function $f(x)=x^T A x$ for $A$ being symmetric ($A=A^T$).
    • Remember that \[ f(x)=\sum_i \sum_j A_{ij}x_ix_j \]
    • If you take partial derivative, after calculation, we will have \[ \frac{\partial f(x)}{\partial x_k}=2\sum_{i=1}^n A_{ki} x_i \]

    Prereq: Gradient of Quadratic Vector Functions

    Consider a quadratic function $f(x)=x^T A x$ for $A$ being symmetric ($A=A^T$).
    • This result can be written compactly in matrix form:
      \[ \nabla_x x^TAx=2Ax \]

    Solving Least Square

    Now we have all the preparations to solve the least square problem: \[ \begin{aligned} &\underset{x}{\text{minimize}}&&\|Ax-b\|^2 \end{aligned} \] First, rewrite the objective as: \[ \begin{aligned} f(x)=\|Ax-b\|^2&=(Ax-b)^T(Ax-b)=(x^TA^T-b^T)(Ax-b)\\ &=x^TA^TAx-2 (b^TA)x+b^Tb \end{aligned} \] Using the gradient rules of linear and quadratic functions, \[ \nabla_x f(x)=2A^TAx-2A^Tb \]

    Solving Least Square

    Now we have all the preparations to solve the least square problem: \[ \begin{aligned} &\underset{x}{\text{minimize}}&&\|Ax-b\|^2 \end{aligned} \] Setting gradient to zero, \[ A^TAx=A^Tb \] The solution to the problem is (assume $A^TA$ is invertible): \[ x=(A^TA)^{-1}A^Tb \]

    Summary of Solving Least Square

    We have derived the solution for least square problem: \[ \begin{aligned} &\underset{x}{\text{minimize}}&&\|Ax-b\|^2 \end{aligned} \] The solution to the problem is (assume $A^TA$ is invertible): \[ x=(A^TA)^{-1}A^Tb \]

    Pseudo Inverse

    Summary of Solving $Ax=b$ for All Cases

    We have discussed the solution of $Ax=b$ for all possible cases:
    • $A$ is invertible: there is a unique solution
    • $b\in \text{range}(A)$ and $\text{null}(A)\neq \{0\}$: there is a family of solutions (we need a special solution to find all)
    • $b\not\in \text{range}(A)$: there is a least-square approximate solution

    Pseudo Inverse

    In linear algebra, there is a matrix called the pseudo-inverse for every $A$ regardless shape, denoted as $A^{\dagger}$, so that \[ x^*=A^{\dagger} b \] can solve $Ax=b$ in the sense that:
    • when $A$ is invertible, $x^*$ is the unique solution
    • when $b\in \text{range}(A)$ and $\text{null}(A)\neq \{0\}$, $x^*$ is a special solution (in fact, the solution with the least $\|x\|)$
    • when $b\not\in \text{range}(A)$, $x^*$ is an approximate solutiona by least square
    Numericallly, this can be computed by the Singular Value Decomposition algorithm (SVD), which is beyond the scope of our course.

    Code Example

    Invertible $A$

    
    A = np.random.rand(3, 3)
    b = np.random.rand(3, 1)
    
    x = np.linalg.pinv(A) @ b
    print('A=', A)
    print('b=', b)
    print('x=', x)
    print('Ax-b', A @ x - b)
    
    # A= [[6.97446218e-01 3.47415656e-01 5.93783190e-01]
    #  [3.20606804e-01 2.51155615e-04 8.31997605e-01]
    #  [4.16578041e-01 4.96157978e-01 4.90465357e-01]]
    # b= [[0.59153536]
    #  [0.92799786]
    #  [0.51953575]]
    # x= [[-0.16647919]
    #  [ 0.02089704]
    #  [ 1.179531  ]]
    # Ax-b [[3.33066907e-16]
    #  [1.11022302e-16]
    #  [3.33066907e-16]]
                                
    Fat $A$
    Solution exists but not unique
    
    A = np.random.rand(2, 3)
    b = np.random.rand(2, 1)
    
    x = np.linalg.pinv(A) @ b
    print('A=', A)
    print('b=', b)
    print('x=', x)
    print('Ax-b', A @ x - b)
    
    # A= [[0.05871486 0.033507   0.94260723]
    #  [0.69295901 0.84740045 0.74196673]]
    # b= [[0.63610879]
    #  [0.07260413]]
    # x= [[-0.23861937]
    #  [-0.33346011]
    #  [ 0.70155682]]
    # Ax-b [[4.44089210e-16]
    #  [2.77555756e-16]]
                                
    Tall $A$
    Least square solution
    
    A = np.random.rand(3, 2)
    b = np.random.rand(3, 1)
    
    x = np.linalg.pinv(A) @ b
    print('A=', A)
    print('b=', b)
    print('x=', x)
    print('Ax-b', A @ x - b)
    
    # A= [[0.26404115 0.29162541]
    #  [0.1664837  0.07120312]
    #  [0.87827123 0.31327209]]
    # b= [[0.99762258]
    #  [0.54386582]
    #  [0.86558511]]
    # x= [[-0.28797735]
    #  [ 3.74833987]]
    # Ax-b [[ 0.01945071]
    #  [-0.32491585]
    #  [ 0.05574292]]
                                
    End