Camera Conventions¶
A homogeneous point [X, Y, Z, 1] in the world coordinate can be projected to a
homogeneous point [x, y, 1] in the image (pixel) coordinate using the
following equation:
We follow the standard OpenCV-style camera coordinate system as illustrated at the beginning of the documentation.
Camera Coordinates¶
Right-handed, with \(Z\) pointing away from the camera towards the view direction and \(Y\) axis pointing down. Note that the OpenCV convention (camtools’ default) is different from the OpenGL/Blender convention, where \(Z\) points towards the opposite view direction, \(Y\) points up and \(X\) points right.
To convert between the OpenCV camera coordinates and the OpenGL-style coordinates, use the conversion functions:
ct.convert.T_opencv_to_opengl()ct.convert.T_opengl_to_opencv()ct.convert.pose_opencv_to_opengl()ct.convert.pose_opengl_to_opencv()
Image Coordinates¶
Starts from the top-left corner of the image, with \(x\) pointing right (corresponding to the image width) and \(y\) pointing down (corresponding to the image height). This is consistent with OpenCV.
Pay attention that the 0th dimension in the image array is the height (i.e., \(y\)) and the 1st dimension is the width (i.e., \(x\)). That is:
\(x\) <=>
u<=> width <=> column <=> the 1st dimension\(y\) <=>
v<=> height <=> row <=> the 0th dimension
Matrix Definitions¶
Camera Intrinsic (K)¶
K is a (3, 3) camera intrinsic matrix:
K = [[fx, s, cx],
[ 0, fy, cy],
[ 0, 0, 1]]
Camera Extrinsic (T or W2C)¶
T is a (4, 4) camera extrinsic matrix:
T = [[R | t = [[R00, R01, R02, t0],
0 | 1]] [R10, R11, R12, t1],
[R20, R21, R22, t2],
[ 0, 0, 0, 1]]
Tis also known as the world-to-cameraW2Cmatrix, which transforms a point in the world coordinate to the camera coordinate.T’s shape is(4, 4), not(3, 4).Tis the inverse ofpose, i.e.,np.linalg.inv(T) == pose.The camera center
Cin world coordinate is projected to[0, 0, 0, 1]in camera coordinate.
Rotation Matrix (R)¶
R is a (3, 3) rotation matrix:
R = T[:3, :3]
Ris a rotation matrix. It is an orthogonal matrix with determinant 1, as rotations preserve volume and orientation. -R.T == np.linalg.inv(R)-np.linalg.norm(R @ x) == np.linalg.norm(x), wherexis a(3,)vector.
Translation Vector (t)¶
t is a (3,) translation vector:
t = T[:3, 3]
t’s shape is(3,), not(3, 1).
Camera Pose (pose or C2W)¶
pose is a (4, 4) camera pose matrix. It is the inverse of T.
poseis also known as the camera-to-worldC2Wmatrix, which transforms a point in the camera coordinate to the world coordinate.poseis the inverse ofT, i.e.,pose == np.linalg.inv(T).
Camera Center (C)¶
C is the camera center:
C = pose[:3, 3]
C’s shape is(3,), not(3, 1).Cis the camera center in world coordinate. It is also the translation vector ofpose.
Projection Matrix (P)¶
P is a (3, 4) camera projection matrix:
Pis the world-to-pixel projection matrix, which projects a point in the homogeneous world coordinate to the homogeneous pixel coordinate.Pis the product of the intrinsic and extrinsic parameters:# P = K @ [R | t] P = K @ np.hstack([R, t[:, None]])
P’s shape is(3, 4), not(4, 4).It is possible to decompose
Pinto intrinsic and extrinsic matrices by QR decomposition.Don’t confuse
Pwithpose. Don’t confusePwithT.