PREEMPT_RT validation harness¶
Detailed design for the PREEMPT_RT validation harness sub-feature (PREEMPT_RT validation harness (FEAT_0022)). The harness is packaged as an out-of-tree cargo bin and consumes the Scan-cycle observability (FEAT_0021) telemetry push channel as its sole measurement path. This page also carries the cycle-overrun fault primitive (Cycle-overrun fault primitive (FEAT_0018)) and the framework internal-fault model (Framework internal-fault model (FEAT_0024)) design.
Context. Documented worst-case jitter (REQ_0110) requires a documented worst-case jitter envelope. The natural ASPICE / industrial pattern is to wire a benchmark gate into CI so regressions block merge. Cloud GitHub-hosted runners do not run PREEMPT_RT and cannot be made to do so without self-hosting. A self-hosted PREEMPT_RT runner for a single-maintainer personal project carries ongoing infra cost (host availability, kernel updates, runner-agent updates). Decision. Package the harness as an out-of-tree cargo bin
under Alternatives considered.
Consequences. ✅ Zero ongoing infra cost; runs are on-demand by the maintainer. ✅ The harness path is identical to the production telemetry path (per Harness consumes runtime te... (REQ_0113)), so the manual run is representative of production behaviour. ❌ Regressions can land between manual runs. Mitigated partly by Allocation-free telemetry u... (TEST_0194) (allocation-free telemetry update) and Overrun counter increments ... (TEST_0192) (overrun counter correctness) staying in regular CI; what the harness uniquely validates is the absolute envelope, not behavioural correctness. |
Context. Cyclictest-style benchmark ... (REQ_0111) requires a representative, repeatable load profile for the jitter harness. The recognised prior art is the ROS 2 real-time working group reference system: a fixed, version-controlled node graph (sensor / transform / fusion / cyclic / command archetypes) with a designated hot path, a per-node CPU calibration tool, and a defined KPI set (hot-path latency, cyclic-node period jitter, dropped samples). Two postures: a faithful port of that graph (so taktora numbers compare apples-to-apples with published reference-system results), or an adapted graph shaped for motion control. Decision. Adapt, do not faithfully port. Reuse the reference system’s node archetypes, KPI definitions, and per-node CPU calibration methodology, but lay out a smaller topology shaped like a motion-control application (a cyclic NC-style node on the hot path, feeding setpoints; auxiliary sensor/fusion nodes off the hot path). Alternatives considered.
Consequences. ✅ The measured load resembles a real taktora motion deployment, so the envelope is meaningful for the product’s actual use. ✅ KPI definitions and per-node calibration are inherited, keeping the harness rigorous and tier-portable. ❌ Numbers are not directly comparable to published reference-system executor results (the graph differs). Documented as a deliberate trade: domain relevance over cross-framework comparability. |
Workspace member
CLI shape: cargo xtask preempt-rt-bench \
--load-profile {idle,cpu-stress,cyclictest-coexist} \
--cycle-count <N> \
--task-count <K> \
--scan-period-us <P>
The harness installs a custom |
New workspace member ``xtask/preempt-rt/``
New document ``docs/preempt-rt-procedure.md`` (deferred to the implementation phase — written when the first measurement run is staged so the procedure can reflect the actual host). Sections planned:
Verification
|
Cycle-overrun fault primitive (FEAT_0018)¶
New module |
Implementation in |
Implementation in |
Four new |
New |
Framework internal-fault model (FEAT_0024)¶
Architecture Decision: Abort on framework-invariant violation; watchdog drives outputs safe ADR_0065
|
Context. The cyclic dispatch path has two nested
Because user-item panics are already neutralised by the inner layer,
the only panics that can reach the outer layer are framework-
internal: a poisoned dispatch The runtime stays on Decision. Treat any panic reaching the outer (framework) boundary
as a non-recoverable internal-invariant violation and fail fast:
invoke a best-effort, time-bounded user fatal handler
(User-registered fatal handler (REQ_0125)), then The documented output failure model on abort is: Alternatives considered.
Consequences. ✅ Infrastructure panics can no longer silently hang the executor;
they become an immediate, observable process abort.
✅ The fail-fast path is exercisable in CI via an injected fatal
handler (User-registered fatal handler (REQ_0125)), so it does not rot.
✅ The output-safe guarantee depends on no post-panic taktora code.
❌ The output-safe timing is bounded by the slave watchdog, not by
taktora; correctness rests on Output-slave watchdog enabl... (AOU_0016) holding. Enforcement
of the ≤ FTTI/2 bound is deferred until the SM watchdog is modelled.
❌ |
The outer (framework) panic boundary, realised at every runtime
thread top: the pool worker loop and inline-submit path in
|
Replace the swallowing |
Existing |