Camera Model

Hao Su

Winter, 2022

Credit: CS231a, Stanford, Silvio Savarese

Agenda

click to jump to the section.

Pinhole Camera

Known since Ancient Times

Early known descriptions are found in the Chinese Mozi writings (circa 500 BCE).

Known since Ancient Times

Ibn al-Haytham (965-1040): "the father of modern optics"
His great book The Optics (Latin: De aspectibus or Perspectivae) explained the principle of perspective projection. Also the origin of the word perspective.

Basic Mechanism

Make a Pinhole Camera (https://kids.nationalgeographic.com/books/article/pinhole-camera)

Definitions

We place a pinhole camera so that it faces the $\mathbf{k}$ direction, and the pinhole is at $O$
$O$: aperture
$\mathbf{k}$: optical axis
$\{\mathbf{i}, \mathbf{j}, \mathbf{k}\}$: an orthogonal frame at $O$. Let us call it the camera frame.

$\Pi'$: retina plane
$f$: focal length
$P$: a point in 3D space
$P'$: the image of $P$ on retina plane

Project 3D Points to Retina Plane

The coordinate of $P$ in the $O\mathbf{ijk}$ frame is $\begin{bmatrix} x\\y\\z \end{bmatrix}$
The coordinate of $P'$ in the $C'\mathbf{i'j'}$ frame on $\Pi'$ is $ \begin{bmatrix} x' \\ y' \end{bmatrix} $

Project 3D Points to Retina Plane

Assume that $O\mathbf{ij}$ plane is parallel to the $\Pi'$ plane, then $ \left\{ \begin{aligned} & x' = f\frac{x}{z}\\ & y' = f\frac{y}{z}\\ \end{aligned} \right. \qquad $ (Why?)

Project 3D Points to Retina Plane

\[ \frac{x'}{f}=\frac{x}{z} \]

Project 3D Points to Retina Plane

More compactly, we denote the projection as \[ (x,y,z)\rightarrow (f\frac{x}{z},f\frac{y}{z}) \]

From Retina Plane to Image Plane

$i'j'$ frame: retina plane frame
$ij$ frame: image plane frame
From camera frame to image plane frame \[ (x,y,z)\rightarrow (f\frac{x}{z}+c_x, f\frac{y}{z}+c_y) \]

Converting to Pixels

Assume the unit of $f$ is m
Assume that the density of light sensor is $k$ pixel/m horizontally, and $l$ pixel/m vertically. If $k\neq l$, then pixels are non-square.

Converting to Pixels

Then we scale by $k$ and $l$ \[ (x,y,z)\rightarrow (fk\frac{x}{z}+c_x, fl\frac{y}{z}+c_y) \]
Here, the unit of $c_x$ and $c_y$ are pixels.

Converting to Pixels

Then we scale by $k$ and $l$ \[ (x,y,z)\rightarrow (fk\frac{x}{z}+c_x, fl\frac{y}{z}+c_y) \]
Let $\alpha=fk$ and $\beta=fl$, $(x,y,z)\rightarrow (\alpha\frac{x}{z}+c_x, \beta\frac{y}{z}+c_y)$

Converting to Pixels

To sum up, so far, the overall transformation is: \[ \begin{bmatrix} x\\y\\z \end{bmatrix} \rightarrow \begin{bmatrix} \alpha\frac{x}{z}+c_x\\ \beta\frac{y}{z}+c_y \end{bmatrix} \] Can we express it as a linear transformation in matrix form?

No. Linear transformations are linear combinations of numbers, but there is a division!

Homogeneous System and
Intrinsic Camera Matrix

Nonetheless, we will use a somewhat hacky way to still
represent the 3D-2D projection by matrix-vector product

Homogeneous Coordinates

Before we introduced the conversion from Euclidean coordinate to Homogeneous coordinate:

On image plane:
\[ \begin{bmatrix} x\\y \end{bmatrix}\Rightarrow \begin{bmatrix} x\\y\\1 \end{bmatrix} \]

In 3D physical space:
\[ \begin{bmatrix} x\\y\\z \end{bmatrix}\Rightarrow \begin{bmatrix} x\\y\\z\\1 \end{bmatrix} \]
Here we introduce a new rule to convert from Homogeneous coordinate to Euclidean coordinate:

On image plane:
\[ \begin{bmatrix} x\\y\\w \end{bmatrix}\Rightarrow \begin{bmatrix} x/w\\y/w \end{bmatrix} \]

In 3D physical space:
\[ \begin{bmatrix} x\\y\\z\\w \end{bmatrix}\Rightarrow \begin{bmatrix} x/w\\y/w\\z/w \end{bmatrix} \]

Now we have the division!

Projective Transformation in the
Homogeneous Coordinate System

3D E$\rightarrow$ 3D H: $ P= \begin{bmatrix} x\\y\\z \end{bmatrix}\rightarrow P_h=\begin{bmatrix} x\\y\\z\\1 \end{bmatrix} $
Build a homogeneous transformation matrix: $ T= \begin{bmatrix} \alpha & 0 & c_x & 0\\ 0 & \beta & c_y & 0\\ 0 & 0 & 1 & 0\\ \end{bmatrix} $
3D H $\rightarrow$ 2D H: $ P_h'=TP_h= \begin{bmatrix} \alpha x + c_x z\\ \beta y + c_y z\\ z \end{bmatrix} $
2D H$\rightarrow$ 2D E: $ P'=\begin{bmatrix} \alpha \frac{x}{z}+c_x\\ \beta \frac{y}{z}+c_y \end{bmatrix} $

$P$ (Euclidean in 3D)
$\downarrow$
$P_h$ (Homogeneous in 3D)
$\downarrow$
$P_h'$ (Homogeneous in 2D)
$\downarrow$
$P'$ (Euclidean in 2D)

Camera Skewness

$\mathbf{k}$-axis may be skewed and not perpendicular to $\Pi'$
The skewness will affect the homogeneous transformation matrix
When projected on retina plane, the $\mathbf{i}$-axis and $\mathbf{j}$-axis has an angle $\theta$ \[ P_h'= \begin{bmatrix} \alpha & -\alpha \cot \theta & c_x & 0\\ 0 & \frac{\beta}{\sin\theta} & c_y & 0\\ 0 & 0 & 1 & 0 \end{bmatrix} \begin{bmatrix} x \\ y \\ z \\ 1 \end{bmatrix} \]

So far, we have 5 parameters to affect the imaging process

Intrinsic Camera Matrix

From $ P_h'= \begin{bmatrix} \alpha & -\alpha \cot \theta & c_x & 0\\ 0 & \frac{\beta}{\sin\theta} & c_y & 0\\ 0 & 0 & 1 & 0 \end{bmatrix} \begin{bmatrix} x \\ y \\ z \\ 1 \end{bmatrix} $,
we can extract the intrinsic camera matrix: $ K= \begin{bmatrix} \alpha & -\alpha \cot \theta & c_x \\ 0 & \frac{\beta}{\sin\theta} & c_y \\ 0 & 0 & 1 \end{bmatrix} $

So $P_h'=K[I, 0]P_h$

Intrinsic Camera Matrix

The common practice is to use another 5-parameter representation of $K$: \[ K= \begin{bmatrix} f_x & s & c_x\\ 0 & f_y & c_y\\ 0 & 0 & 1 \end{bmatrix} \]
In practice, $K$ may be accessed by the SDK of cameras
- For example, here is a StackOverflow post that discusses extracting the intrinsic camera matrix of the ARKit of Apple
We will also introduce algorithms to estimate $K$ in subsequent lectures

Extrinsic Camera Matrix

Camera Frame

The previous derivations assume that the coordinate of $P$ is in the $O\mathbf{ijk}$ frame
Note that the $O\mathbf{ijk}$ frame is binded to the camera, which is referred to as the camera frame
In practice, the camera may move around, so using the camera frame to record object location is inconvenient

World Frame

So we assume a static world frame to record object coordinates, and also record the pose of the camera
$O_w\mathbf{i}_w\mathbf{j}_w\mathbf{k}_w$ is the world frame

The coordinate of $P$ in world frame is \[ P_w= \begin{bmatrix} x_w\\y_w\\z_w\\1 \end{bmatrix} \]

Extrinsic Camera Matrix

We can use $(R,T)$ to transform the world frame coordinate to the camera frame coordinate (homogeneous): \[ P_h = \begin{bmatrix} R & t \\ 0 & 1 \end{bmatrix}P_w \]

Extrinsic Camera Matrix

$ \begin{bmatrix} R & t\\0 & 1 \end{bmatrix} $ is called extrinsic camera matrix. There are 6 parameters (3 in $R$ and 3 in $T$)

Projective Transformation from World Frame

Recall the projection from camera frame to image plane by intrinsic camera matrix: \[ P_h'=K[I, 0]P_h \]
We just derived that \[ P_h = \begin{bmatrix} R & t \\ 0 & 1 \end{bmatrix}P_w \]
Composed together, we can transform from world frame to image plane: \[ P_h'=K \begin{bmatrix} I & 0 \end{bmatrix} \begin{bmatrix} R & t\\ 0 & 1 \end{bmatrix}P_w= K \begin{bmatrix} R & T \end{bmatrix} P_w \]

Camera Matrix

\[ P_h'= K \begin{bmatrix} R & T \end{bmatrix} P_w \]

$M=K \begin{bmatrix} R & t \end{bmatrix}\in\R{3\times 4}$ includes all the parameters of a pinhole camera to form images!
We refer to $M$ as the camera matrix
While $M$ has 12 numbers, not all matrices in $\R{3\times 4}$ are valid camera matrices
We can use the 5 parameters in $K$ and 6 parameters in $[R,t]$ to generate all valid $M$'s
In literature, the common expression is $M$ has 11 degree of freedoms

Properties of Projective Transformations

Projective transformation has been used by artists since Renaissances!

The Healing of the Cripple and Raising of Tabitha, by Masolino (1426–1427)
Cappella Brancacci, Santa Maria del Carmine, Florence

http://www.essentialvermeer.com/technique/perspective/history.html

We will explain properties by examples and derive them mathematically.

Points Project to Points

Mathematically, the projection of a point in 3D world frame can be uniquely determined by a function, which is the projective transformation function: \[ P_h'= K \begin{bmatrix} R & t \end{bmatrix} P_w \]

Parallel Lines Meet

In 3D, the points on a line passing $\vec{P}_0=[x_0, y_0, z_0]^T$ can be parameterized by: \[ P_w= \begin{bmatrix} x_0\\y_0\\z_0\\1 \end{bmatrix}+ s \begin{bmatrix} d_x\\d_y\\d_z\\0 \end{bmatrix} \] where $\vec{d}=[d_x, d_y, d_z]^T$ is the direction of the line.

Parallel Lines Meet

By our projective transformation, \[ \begin{aligned} P_h'=K[R, t] \left( \begin{bmatrix} \vec{P}_0\\1 \end{bmatrix}+s \begin{bmatrix} \vec{d}\\0 \end{bmatrix} \right) =K(R\vec{P}_0+t)+sKR\vec{d} \end{aligned} \]
Suppose that the camera location is fixed. When the point moves along the line, only $s$ changes.
So we introduce two constant vectors: \[ \vec v_1=R\vec P_0+t,\qquad \vec v_2=R\vec{d} \]
Then we have \[ P_h'=K(\vec v_1+s\vec v_2) \]

Parallel Lines Meet

\[ P_h'=K(\vec v_1+s\vec v_2), \qquad \vec v_1=R\vec x_0+t,\qquad \vec v_2=R\vec{d} \]

We make use of the structure of $K$: \[ K= \begin{bmatrix} & \vec k_1^T & \\ & \vec k_2^T & \\ 0 & 0 & 1 \end{bmatrix} \]
Therefore, \[ P_h'= \begin{bmatrix} & \vec k_1^T & \\ & \vec k_2^T & \\ 0 & 0 & 1 \end{bmatrix}(\vec v_1+s\vec v_2) =\begin{bmatrix} \vec k_1^T \vec v_1 + s \vec k_1^T \vec v_2\\ \vec k_2^T \vec v_1 + s \vec k_2^T \vec v_2\\ \vec v_{1,3}+s\vec{v}_{2,3} \end{bmatrix} \Rightarrow P'= \begin{bmatrix} \frac{\vec k_1^T \vec v_1 + s \vec k_1^T \vec v_2}{\vec v_{1,3}+s\vec{v}_{2,3}}\\ \frac{\vec k_2^T \vec v_1 + s \vec k_2^T \vec v_2}{\vec v_{1,3}+s\vec{v}_{2,3}} \end{bmatrix} \]

Parallel Lines Meet

\[ P_h'=K(\vec v_1+s\vec v_2), \qquad \vec v_1=R\vec P_0+t,\qquad \vec v_2=R\vec{d},\qquad P'= \begin{bmatrix} \frac{\vec k_1^T \vec v_1 + s \vec k_1^T \vec v_2}{\vec v_{1,3}+s\vec{v}_{2,3}}\\ \frac{\vec k_2^T \vec v_1 + s \vec k_2^T \vec v_2}{\vec v_{1,3}+s\vec{v}_{2,3}} \end{bmatrix} \]

Assume that $\vec v_{2,3}\neq 0$
- The 3D point goes to infinity as $s\to \infty$, so $ P'\to \begin{bmatrix} \frac{\vec k_1^T\vec v_2}{\vec v_{2,3}}\\ \frac{\vec k_2^T\vec v_2}{\vec v_{2,3}}\\ \end{bmatrix} $.
- This point is called the vanishing point

Parallel Lines Meet

\[ P_h'=K(\vec v_1+s\vec v_2), \qquad \vec v_1=R\vec P_0+t,\qquad \vec v_2=R\vec{d},\qquad P'= \begin{bmatrix} \frac{\vec k_1^T \vec v_1 + s \vec k_1^T \vec v_2}{\vec v_{1,3}+s\vec{v}_{2,3}}\\ \frac{\vec k_2^T \vec v_1 + s \vec k_2^T \vec v_2}{\vec v_{1,3}+s\vec{v}_{2,3}} \end{bmatrix} \]

When $\vec v_{2,3} = 0$.
- After projection, the 3D lines intersect at infinity. In other words, they are parallel
- When does this happen?

Modern Work on Vanishing Point Prediction

Vanishing point provides crucial information about the 3D structure of the scene. Applications such as

camera calibration
single-view 3D scene reconstruction
autonomous navigation
semantic scene parsing

While vanishing point is known for long, the research is still active.
For example, NeurVPS: Neural Vanishing Point Scanning via Conic Convolution, Zhou et al, NeurIPS 2019

https://sites.google.com/site/yorkyuhuang/home/research/computer-vision-augmented-reality/vanishing-point-detection

End

Camera Model

Hao Su

Winter, 2022

Agenda

Pinhole Camera

Known since Ancient Times

Known since Ancient Times

Basic Mechanism

Definitions

Project 3D Points to Retina Plane

Project 3D Points to Retina Plane

Project 3D Points to Retina Plane

Project 3D Points to Retina Plane

From Retina Plane to Image Plane

From Retina Plane to Image Plane

Converting to Pixels

Converting to Pixels

Converting to Pixels

Converting to Pixels

Homogeneous System and Intrinsic Camera Matrix

Homogeneous Coordinates

Projective Transformation in the Homogeneous Coordinate System

Camera Skewness

Intrinsic Camera Matrix

Intrinsic Camera Matrix

Extrinsic Camera Matrix

Camera Frame

World Frame

Extrinsic Camera Matrix

Extrinsic Camera Matrix

Projective Transformation from World Frame

Camera Matrix

Properties of Projective Transformations

Points Project to Points

Points Project to Points

Parallel Lines Meet

Parallel Lines Meet

Parallel Lines Meet

Parallel Lines Meet

Parallel Lines Meet

Parallel Lines Meet

Modern Work on Vanishing Point Prediction

Homogeneous System and
Intrinsic Camera Matrix

Projective Transformation in the
Homogeneous Coordinate System