Skip to content

Coordinate Frames, Projections, and SE(3) for Autonomous Systems

Coordinate Frames, Projections, and SE(3) for Autonomous Systems curated visual

Visual: transform tree from map to odom to base to sensors to image plane, with SE(3) composition, projection, and common frame-error points.

Coordinate conventions are not bookkeeping. They define what every position, velocity, covariance, detection, map feature, and control command means. A stack can have strong perception and planning models and still fail if one frame is left-handed, one timestamp is late, or one map origin is silently changed.

This page is the reusable foundation for 3D transforms, geodetic projections, ROS/Autoware frame semantics, and SE(3) notation across road AV, indoor AMR, outdoor industrial, and airport airside deployments.


1. AV, Indoor, Outdoor, and Airside Relevance

DomainWhy frames matterTypical failure to prevent
Road AVCameras, LiDAR, radar, GNSS/INS, HD maps, predictions, and controls must agree on map, odom, base_link, and sensor frames.Lane-level localization is correct in UTM but detections are projected through an inverted camera extrinsic.
Indoor robotsGNSS is absent, so local map frames, dock frames, shelf frames, and fiducial frames carry all spatial authority.A warehouse map is re-zeroed without updating route, costmap, and charging-dock frames.
Outdoor industrialGNSS, local site survey control, dust/rain sensors, and large map tiles meet at local tangent planes.A UTM zone, datum, or float precision mistake creates meter-scale drift across a yard or mine.
Airside autonomyRTK/INS, surveyed AMDB/AMXM geometry, LiDAR maps, aircraft stands, jet-blast zones, and ground-control procedures must share a frame chain.map -> odom jumps are sent to controllers instead of staying in localization, causing a lateral command discontinuity near an aircraft.

2. Core Frame Vocabulary

2.1 Coordinate Frame Contract

A frame definition must answer four questions:

  1. Origin: Where is (0, 0, 0)?
  2. Axis directions: Which way do x, y, and z point?
  3. Units and handedness: Meters? Radians? Right-handed?
  4. Authority: Which component is allowed to publish or modify the transform?

ROS REP-103 standardizes SI units and right-handed frames. For vehicle body frames, the common convention is:

x = forward
y = left
z = up

Camera optical frames are different:

z = forward along optical axis
x = right in image
y = down in image

This optical convention is a common source of projection bugs. Never assume a camera frame is the same as base_link with a rotation applied informally. Publish the full calibrated transform.

2.2 Mobile Robot Frame Stack

ROS REP-105 defines a widely used mobile robot frame hierarchy:

earth -> map -> odom -> base_link -> sensor_frame
FrameMeaningContinuity
earthGlobal Earth-fixed reference, usually ECEF or a geodetic anchor.Globally stable.
mapLocal world frame used by maps and global localization.May jump when localization corrects drift.
odomLocally smooth odometry frame.Continuous, but drifts over time.
base_linkVehicle body reference, often near rear axle, center of gravity, or geometric center.Moves with the vehicle.
Sensor framesCamera, LiDAR, radar, IMU, GNSS antenna, wheel odometry frames.Fixed relative to base_link unless online calibration updates them.

For controllers, the key distinction is:

map: globally accurate but allowed to jump
odom: locally smooth but allowed to drift

Planning can work in map, but low-level control should consume a smooth pose or trajectory interface so localization corrections do not appear as physical vehicle motion.

2.3 ENU, NED, ECEF, and Geodetic Coordinates

Autonomy stacks commonly move between these representations:

RepresentationAxes / valuesUse
WGS84 geodeticlatitude, longitude, ellipsoidal heightGNSS receivers, survey data, aviation and map metadata.
ECEFEarth-centered, Earth-fixed Cartesian metersGlobal frame composition without local projection zones.
ENUEast, north, up local tangent planeROS-friendly local Cartesian frame for ground robots.
NEDNorth, east, down local tangent planeCommon in aerospace and some INS outputs.
UTM/MGRSProjected grid meters by zoneHD maps, road-scale and site-scale mapping.
Local CartesianTangent-plane meters around a chosen originAirports, warehouses, campuses, yards, and closed sites.

The ENU/NED distinction must be explicit at every INS, flight, or aviation integration boundary. A NED attitude or velocity fed into an ENU estimator often looks plausible until turns, slopes, or vertical motion expose the sign error.


3. SE(3) Transform Fundamentals

3.1 Homogeneous Transform Notation

Use a transform T_A_B to mean "pose of frame B expressed in frame A" or "transform points from B coordinates into A coordinates."

p_A = T_A_B * p_B

A rigid 3D transform in SE(3) is:

T_A_B = [ R_A_B  t_A_B ]
        [ 0 0 0    1   ]

where:

  • R_A_B is a 3x3 rotation matrix in SO(3)
  • t_A_B is the origin of frame B expressed in frame A
  • p_B is written as homogeneous coordinates [x, y, z, 1]^T

3.2 Composition and Inversion

Transform composition follows the frame chain:

T_A_C = T_A_B * T_B_C
p_A   = T_A_B * T_B_C * p_C

The inverse transform is:

T_B_A = inv(T_A_B)
R_B_A = R_A_B^T
t_B_A = -R_A_B^T * t_A_B

Most calibration bugs are direction bugs. A file named camera_to_lidar.yaml is ambiguous unless it states whether it stores T_camera_lidar or T_lidar_camera and gives a projection sanity test.

3.3 Lie Algebra, Small Errors, and Covariance

SE(3) poses live on a manifold, not in Euclidean vector space. Small pose errors are represented in the tangent space se(3) as a 6-vector:

xi = [omega_x, omega_y, omega_z, v_x, v_y, v_z]
T  = Exp(xi)
xi = Log(T)

This matters for optimization and filtering:

  • A pose residual should usually be computed with Log(T_measured^-1 * T_estimated).
  • Orientation uncertainty should be represented in a local tangent space, not by directly subtracting quaternions.
  • Covariances attached to poses must define both the frame and the ordering of the 6-vector.

For uncertainty propagation through transform composition, the SE(3) adjoint is the standard tool:

xi_A = Ad_T_A_B * xi_B
Sigma_A = Ad_T_A_B * Sigma_B * Ad_T_A_B^T

If a LiDAR detection covariance is reported in the sensor frame and later fused in map, this covariance rotation is not optional.


4. Projection and Sensor Geometry

4.1 Camera Projection

The pinhole camera projection from a 3D map point to pixels is:

p_camera = T_camera_map * p_map
[u, v, 1]^T ~ K * [x/z, y/z, 1]^T

where K is the camera intrinsic matrix. In a full chain:

T_camera_map = T_camera_base * T_base_odom * T_odom_map

Practical checks:

  • Points behind the camera have z <= 0 in the optical frame.
  • Straight vertical poles should not curve after projection unless lens distortion is still present.
  • Reprojected LiDAR points should align with image edges under braking, turning, and vibration, not only in static scenes.

4.2 LiDAR, Radar, and BEV Projection

LiDAR and radar often feed bird's-eye-view grids:

i = floor((x_base - x_min) / resolution)
j = floor((y_base - y_min) / resolution)

The grid definition must state:

  • source frame (base_link, map, or sensor frame)
  • resolution
  • origin and cell-center convention
  • row/column orientation
  • whether ego motion compensation has already been applied

For radar, preserve Doppler frame semantics. Radial velocity is measured along the radar line of sight and must be transformed carefully before it becomes vx or vy in vehicle coordinates.

4.3 Map Projections

For site-scale autonomy, a local Cartesian or projected map is usually better than doing runtime planning directly in latitude/longitude. Recommended pattern:

WGS84 / survey control
  -> ECEF or map projection
  -> local metric map frame
  -> odom
  -> base_link

Deployment notes:

  • Put the projection origin near the operating site.
  • Record datum, epoch, projection type, zone, and origin in map metadata.
  • Avoid large absolute coordinates in float32 ML or GPU kernels. Subtract a local origin before rasterization or tensor encoding.
  • Do not mix ellipsoidal height, geoid height, and local floor elevation without a named conversion.

For airports, the aerodrome reference point or a surveyed control point is a natural local origin, but final authority should follow the survey/control process used by the airport and mapping team.


5. Practical Deployment Notes

5.1 Frame Authority Matrix

Maintain a single owner for each transform:

TransformTypical owner
map -> odomLocalization estimator.
odom -> base_linkOdometry or fused state estimator.
base_link -> sensorCalibration artifact loaded as static TF.
earth -> mapMap loader or georeference service.
map -> dock/stand/zoneMap or operations layer.

Do not let multiple nodes publish the same transform unless there is a clearly defined arbitration mechanism.

5.2 Minimum Frame Metadata for Logs

Every dataset and incident log should preserve:

  • TF tree or equivalent transform graph
  • static transform files and their version IDs
  • map projection metadata
  • sensor frame_id values
  • timestamp source for each topic
  • covariance frame and variable ordering
  • vehicle reference point definition

Without this metadata, replayed incidents can be impossible to reproduce.

5.3 Sanity Tests

Before deployment, run these checks:

  1. Project LiDAR points into all cameras while driving a loop with turns and stops.
  2. Verify a known surveyed point appears at the expected map coordinate.
  3. Confirm positive yaw direction with a slow left turn.
  4. Confirm map -> odom corrections do not create controller discontinuities.
  5. Validate covariance ellipses visually after transforming them between frames.
  6. Replay one bag with all static transforms removed except the checked-in calibration files.

6. Failure Modes and Risks

Failure modeSymptomMitigation
ENU/NED mix-upVertical or yaw signs look inverted; aircraft/INS integrations disagree with ROS.Convert at the driver boundary and name frames with _ned only when truly NED.
Left-handed frameMirrored detections or lane geometry.Enforce REP-103 right-handed frames and run basis-vector tests.
Extrinsic direction reversalReprojected objects appear displaced in a way that grows with range.Store T_target_source naming and include inverse tests in calibration CI.
map correction sent to controllerVehicle command jumps after localization update.Keep controller reference smooth in odom or trajectory-relative coordinates.
Wrong projection zone or datumSite map aligns locally but drifts or offsets globally.Record projection metadata and compare against surveyed control points.
Float precision lossGPU BEV or map tensors show quantization at large coordinates.Use local origins for ML and rasterization.
Covariance in wrong frameFusion becomes overconfident or rejects good measurements.Transform covariance with the rotation or SE(3) adjoint.
Timestamped transform lookup errorMoving objects smear or sensor fusion is biased during turns.Query transforms at measurement acquisition time, not processing time.


Sources

Public research notes collected from public sources.