Camera Conventions
==================

.. only:: not latex

   .. image:: https://raw.githubusercontent.com/yxlao/camtools/main/camtools/assets/camera_coordinates_light.png
      :width: 520
      :align: center
      :alt: Camera Coordinates
      :class: only-light

   .. image:: https://raw.githubusercontent.com/yxlao/camtools/main/camtools/assets/camera_coordinates_dark.png
      :width: 520
      :align: center
      :alt: Camera Coordinates
      :class: only-dark

A homogeneous point ``[X, Y, Z, 1]`` in the world coordinate can be projected to a
homogeneous point ``[x, y, 1]`` in the image (pixel) coordinate using the
following equation:

.. math::

   \lambda
   \left[\begin{array}{l}
   x \\
   y \\
   1
   \end{array}\right]=\left[\begin{array}{ccc}
   f_{x} & 0 & c_{x} \\
   0 & f_{y} & c_{y} \\
   0 & 0 & 1
   \end{array}\right]\left[\begin{array}{llll}
   R_{00} & R_{01} & R_{02} & t_{0} \\
   R_{10} & R_{11} & R_{12} & t_{1} \\
   R_{20} & R_{21} & R_{22} & t_{2}
   \end{array}\right]\left[\begin{array}{c}
   X \\
   Y \\
   Z \\
   1
   \end{array}\right].

We follow the standard OpenCV-style camera coordinate system as illustrated at
the beginning of the documentation.

Camera Coordinates
------------------

Right-handed, with :math:`Z` pointing away from the camera towards the view direction
and :math:`Y` axis pointing down. Note that the OpenCV convention (camtools' default)
is different from the OpenGL/Blender convention, where :math:`Z` points towards the
opposite view direction, :math:`Y` points up and :math:`X` points right.

To convert between the OpenCV camera coordinates and the OpenGL-style coordinates,
use the conversion functions:

- ``ct.convert.T_opencv_to_opengl()``
- ``ct.convert.T_opengl_to_opencv()``
- ``ct.convert.pose_opencv_to_opengl()``
- ``ct.convert.pose_opengl_to_opencv()``

Image Coordinates
-----------------

Starts from the top-left corner of the image, with :math:`x` pointing right
(corresponding to the image width) and :math:`y` pointing down (corresponding to
the image height). This is consistent with OpenCV.

Pay attention that the 0th dimension in the image array is the height (i.e., :math:`y`)
and the 1st dimension is the width (i.e., :math:`x`). That is:

- :math:`x` <=> ``u`` <=> width <=> column <=> the 1st dimension
- :math:`y` <=> ``v`` <=> height <=> row <=> the 0th dimension

Matrix Definitions
------------------

Camera Intrinsic (K)
^^^^^^^^^^^^^^^^^^^^^^^^^^^

``K`` is a ``(3, 3)`` camera intrinsic matrix:

.. code-block:: python

   K = [[fx,  s, cx],
        [ 0, fy, cy],
        [ 0,  0,  1]]

Camera Extrinsic (T or W2C)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

``T`` is a ``(4, 4)`` camera extrinsic matrix:

.. code-block:: python

   T = [[R  | t   = [[R00, R01, R02, t0],
         0  | 1]]    [R10, R11, R12, t1],
                     [R20, R21, R22, t2],
                     [  0,   0,   0,  1]]

- ``T`` is also known as the world-to-camera ``W2C`` matrix, which transforms a
  point in the world coordinate to the camera coordinate.
- ``T``'s shape is ``(4, 4)``, not ``(3, 4)``.
- ``T`` is the inverse of ``pose``, i.e., ``np.linalg.inv(T) == pose``.
- The camera center ``C`` in world coordinate is projected to ``[0, 0, 0, 1]`` in
  camera coordinate.

Rotation Matrix (R)
^^^^^^^^^^^^^^^^^^^

``R`` is a ``(3, 3)`` rotation matrix:

.. code-block:: python

   R = T[:3, :3]

- ``R`` is a rotation matrix. It is an orthogonal matrix with determinant 1, as
  rotations preserve volume and orientation.
  - ``R.T == np.linalg.inv(R)``
  - ``np.linalg.norm(R @ x) == np.linalg.norm(x)``, where ``x`` is a ``(3,)`` vector.

Translation Vector (t)
^^^^^^^^^^^^^^^^^^^^^^

``t`` is a ``(3,)`` translation vector:

.. code-block:: python

   t = T[:3, 3]

- ``t``'s shape is ``(3,)``, not ``(3, 1)``.

Camera Pose (pose or C2W)
^^^^^^^^^^^^^^^^^^^^^^^^^

``pose`` is a ``(4, 4)`` camera pose matrix. It is the inverse of ``T``.

- ``pose`` is also known as the camera-to-world ``C2W`` matrix, which transforms a
  point in the camera coordinate to the world coordinate.
- ``pose`` is the inverse of ``T``, i.e., ``pose == np.linalg.inv(T)``.

Camera Center (C)
^^^^^^^^^^^^^^^^^

``C`` is the camera center:

.. code-block:: python

   C = pose[:3, 3]

- ``C``'s shape is ``(3,)``, not ``(3, 1)``.
- ``C`` is the camera center in world coordinate. It is also the translation
  vector of ``pose``.

Projection Matrix (P)
^^^^^^^^^^^^^^^^^^^^^

``P`` is a ``(3, 4)`` camera projection matrix:

- ``P`` is the world-to-pixel projection matrix, which projects a point in the
  homogeneous world coordinate to the homogeneous pixel coordinate.
- ``P`` is the product of the intrinsic and extrinsic parameters:

  .. code-block:: python

    # P = K @ [R | t]
    P = K @ np.hstack([R, t[:, None]])

- ``P``'s shape is ``(3, 4)``, not ``(4, 4)``.
- It is possible to decompose ``P`` into intrinsic and extrinsic matrices by QR
  decomposition.
- Don't confuse ``P`` with ``pose``. Don't confuse ``P`` with ``T``.