Skip to content

PTP Grandmaster Failover and BMCA

Last updated: 2026-05-09

Why It Matters

The best clock in the vehicle is a safety-critical dependency for fused perception. When the active grandmaster loses GNSS, resets, or is unplugged, the vehicle should not discover its failover behavior for the first time during an incident. BMCA, clock class, priority fields, holdover state, and network admission rules must be designed, tested, and diagnosed.

BMCA is the Best Master Clock Algorithm used by PTP/gPTP domains to select the grandmaster from announced clock data. A production AV should treat BMCA inputs as a controlled configuration surface, not as a passive default.

Deployment Contract

Contract itemPractical rule
GM allowlistOnly approved timing devices may become grandmaster on the vehicle time domain.
Priority planSet priority1, clockClass, clockAccuracy, offsetScaledLogVariance, and priority2 deliberately for primary, backup, and lab modes.
Holdover policyA GNSS-disciplined GM in holdover must advertise a worse class or trigger a degraded timing state according to the safety case.
Failover budgetDefine maximum allowed time error, convergence time, and data rejection window during GM loss and recovery.
Switch behaviorBoundary or time-aware switches must propagate the new GM consistently and expose port state changes.
Sensor behaviorSensors must report lock loss, last sync age, and timestamp mode through telemetry.
Recorder evidenceLogs must capture GM identity changes, BMCA data sets, offset, path delay, and degraded-mode transitions.

BMCA Inputs to Control

Data set fieldDesign use
priority1Administrative override. Use it to prefer the production GM over backup or service tools.
clockClassDescribes traceability and holdover quality. It should degrade when GNSS lock or time integrity degrades.
clockAccuracyAdvertised accuracy class. Do not overstate it during holdover.
offsetScaledLogVarianceStability metric used in comparison.
priority2Tie-breaker among clocks of the same role or class.
clockIdentityStable identity for allowlisting and incident logs.
domainNumberIsolation between production, lab, and depot timing domains.

Example vehicle policy:

RoleStateBMCA intentMission policy
Primary GNSS GMGNSS locked and integrity OKWins BMCA.Full mission allowed.
Primary GNSS GMHoldover, bounded errorMay remain GM but declares degraded class.Continue only inside holdover budget.
Backup GMGNSS locked or disciplined from primary site timeWins if primary fails.Mission may continue after convergence gate.
Compute PHCOrdinary clock onlyNever wins in production.Not a GM candidate.
Service laptopLab domain onlyNever appears on production domain.Block or isolate if detected.

Failover Sequence

text
Normal:
  primary GM -> switches -> sensors and compute

Primary GNSS degraded:
  diagnostics marks holdover
  BMCA data changes if policy requires it
  fusion tracks growing time uncertainty

Primary GM lost:
  announce timeout expires
  backup GM wins BMCA
  ptp4l ports transition and servo reconverges
  sensors report relock or free-run state
  fusion rejects data during the configured uncertainty window

Primary recovers:
  recovery waits for stability and hysteresis
  BMCA may switch back only if policy allows
  recorder marks the second time discontinuity risk window

Use hysteresis. Repeated GM flapping is often worse for fusion than staying on a slightly degraded but bounded holdover source.

Failure Modes

Failure modeSymptomResponse
Rogue GM wins BMCAGrandmaster identity changes to an unexpected device.Isolate port, reject timebase, and inhibit mission start or continue in degraded mode.
Backup GM has better priority but worse timeVehicle switches to a lab or stale source.Version priority plan and enforce GM allowlist plus clock-class checks.
Holdover over-advertisedGM keeps a high-quality class after GNSS loss.Require holdover state from timing appliance and cross-check with GNSS/UTC source.
Failback flappingGM identity changes repeatedly after GNSS reacquisition.Add holdoff timers and require stable offset before failback.
Boundary switch asymmetryDifferent sensor branches follow different GMs.Monitor port parent identity and timebase per branch.
Servo step accepted as normalEstimator sees sudden ego-motion inconsistency.Gate fusion on timebase-change events and reset synchronization filters deliberately.
Announce blocked by network policyEndpoints stay on stale master or free-run.Test multicast/VLAN filtering and PTP queue policy under security rules.

Telemetry

Capture these at 1 Hz or on change, with higher-rate samples during faults:

  • Active grandmaster identity, parent identity, domain, and profile.
  • Local port state and selected best-master reason.
  • Clock class, accuracy, variance, priorities, time source, and UTC offset.
  • Offset from master, mean path delay, frequency adjustment, servo state.
  • Announce timeout count, sync timeout count, sequence gaps, and packet loss.
  • Holdover state, GNSS lock, PPS validity, antenna alarm, and time integrity flags from the timing source.
  • Per-sensor PTP lock, last sync age, and timestamp mode.

Useful commands:

bash
pmc -u -b 0 'GET DEFAULT_DATA_SET'
pmc -u -b 0 'GET CURRENT_DATA_SET'
pmc -u -b 0 'GET PARENT_DATA_SET'
pmc -u -b 0 'GET TIME_STATUS_NP'
pmc -u -b 0 'GET PORT_DATA_SET'

Validation Hooks

  • Pull GNSS antenna from the primary GM and verify clock-class/holdover telemetry changes before the time error budget is exceeded.
  • Power-cycle the primary GM while the vehicle logs high-bandwidth sensors and confirm backup GM convergence within the failover budget.
  • Connect a service laptop running PTP in the production VLAN and confirm it cannot become GM.
  • Reintroduce the primary GM and check failback hysteresis.
  • Replay the event and verify fusion rejected or downweighted data during the documented uncertainty window.

Sources

Public research notes collected from public sources.