Runtime diagnostics (medkit)¶
Architecture of taktora-medkit — the SOVD-aligned runtime-diagnostics
surface specified by Runtime diagnostics (SOVD-a... (FEAT_0100). This page records the load-bearing
architectural decision (the off-path boundary and extractable-core layout) and
the crate-level building blocks that realise it.
Solution strategy¶
arc42 §4.
Context. taktora’s value is a bounded-time control path: the taktora-executor WaitSet loop must not allocate or block in steady state (Allocation-free telemetry u... (REQ_0104)). A diagnostics surface wants the opposite — a REST server, JSON serialisation, an async runtime, unbounded client behaviour. Wiring that directly into the runtime would put network and allocation latency on the control path. Separately, medkit is a clean-room take on the ros2_medkit contract and may later live as its own project; if taktora types leak into its model and gateway, that extraction becomes a detangle. Decision. Quarantine the two concerns by crate boundary. The core
crates — Consequences. ✅ The control path stays bounded — diagnostics can stall
without perturbing the machine. ✅ The core folder lifts out via
|
Context. The walking skeleton (GitHub #81) must take HTTP in and emit
contract-correct SOVD JSON out, backed by the mock provider, and prove it
against Decision. (1) Introduce a plain-data Consequences. ✅ #82/#83/#84 have an obvious seam: add a snapshot source
or a manifest step to the pipeline without touching the HTTP layer. ✅ The
resolvers stay pure and transport-neutral, testable without a socket.
✅ Shape-diffing tolerates the corpus’s internal inconsistency while still
catching envelope/casing/structure regressions. ❌ The gateway test does not
assert exact bytes, so a value-level divergence within a correct shape would
pass there (the model crate’s byte tests cover that axis). ❌ Server-rendered
views the model does not carry (the single-entity capability catalogue, the
|
Context. The executor binding (GitHub #83) must record App / executor
liveness and per-task timing from taktora-executor Decision. Use a fixed set of pre-allocated per-task slots, each a bag
of atomics (liveness state, lifecycle counters, last / EWMA / min / max
execution duration, scan period), rather than an overwrite ring of
observation records. Tasks are registered up front ( Consequences. ✅ The hook path is provably allocation-free and lock-free,
asserted by a counting-allocator differential test (Hook write path performs ze... (TEST_0914)).
✅ Memory is bounded to the registered task count, fixed at construction.
✅ A stalled or slow gateway reader can never back-pressure or perturb the
control path. ❌ Tasks must be known up front; an item whose |
Context. Allocation-freedom is asserted across the workspace by
counting-global-allocator differential tests (the Decision. Pre-1.0, verified zero-allocation (a counting-allocator
regression test in CI) is required only where the property is load-bearing:
executor scope (the dispatch / telemetry fold on the Alternatives considered.
Consequences. ✅ The flake class leaves the medkit suite; CI signal-to-noise improves without touching the binding’s design. ✅ The enforcement boundary is now explicit and citable for future subsystems (new no-alloc tests need an executor- or connector-scope justification). ❌ An allocation regression introduced into the medkit hook write path is no longer caught by CI; re-verifying Non-blocking, bounded hook ... (REQ_0925) (with a thread-scoped harness) is deferred to the 1.0 hardening pass. |
Context. medkit v1 does no service discovery (raw iceoryx2 introspection
is out of scope), so nothing enumerates the system to supply the
Area/Component grouping the SOVD tree hangs on. The bindings emit only flat,
raw entities ( Decision. Introduce a sibling core crate Consequences. ✅ The grouping is declarative and lives in one place; ops
edit |
Context. The connector binding (GitHub #84) must turn a connector’s
Decision. (1) Model the input as a health event stream the binding
ingests ( Consequences. ✅ The binding is testable with a simulated transition
sequence and pluggable onto a real per-connector health surface later.
✅ Reason strings flow through unchanged, so new degraded conditions need no
binding change. ✅ DTC memory gives a maintenance history (occurrence counts,
heal/raise) rather than a momentary view. ❌ A wall-clock timestamp must be
supplied alongside each event, since the connector’s |
Architecture Decision: Additive freeze-frame seam through the snapshot (fault_environments) ADR_0116
|
Context. Connector health → DTC mapp... (ADR_0115) left a documented gap: Decision. Carry per-fault environment data through the snapshot seam
additively: add Consequences. ✅ The freeze-frame reaches clients through the proper SOVD
fault-detail endpoint, closing the Connector health → DTC mapp... (ADR_0115) gap. ✅ The change is
purely additive — the fault-list wire contract and every existing provider
compile and behave unchanged. ✅ The |
Context. The captured fault-stream golden
( Decision. Split authority by layer. The frame envelope and the
data-object shape are authoritative from the golden and are reproduced
byte-for-byte (the Consequences. ✅ A drop-in |
Context. A drop-in SOVD client authenticates via Decision. Ship an auth-light v1 that preserves the client login flow
behind a seam. Introduce an Consequences. ✅ A |
Architecture Decision: Diagnostic write surface gated by Freedom-From-Interference; only QM-scoped families are v1 ADR_0119
|
Context. medkit is a QM-grade, off-control-path diagnostic surface (Off-path / freedom from int... (REQ_0910), Off-path diagnostics bounda... (ADR_0111)). Adding SOVD write/action families would let a diagnostic client cause an effect on a live real-time control system for the first time. Decision. A QM→SC write is forbidden by the safety argument:
Directional channel topology (AFSR_0002) (a reader of integrity Alternatives considered. (a) Ship all write families with gateway-side guards only — rejected, violates Directional channel topology (AFSR_0002). (b) Cohost a write path in the SC process now — rejected per Process boundary as spatial... (ADR_0050). (c) Defer the entire write surface indefinitely — rejected; locks is safe and unblocks future coordinated writes (issue #149). Consequences. ✅ The QM→SC boundary stays a hard safety invariant rather than a per-feature judgement call. ✅ locks (Locks are diagnostic-coordi... (ADR_0120)) ships now as the one zero-SC-coupling family. ❌ Every other write family carries a HARA + SC-gate prerequisite before it can land. |
Context. Of the six SOVD write families, locks is the only one that can be built clean-room in v1 without a HARA update (Diagnostic write surface ga... (ADR_0119)): a lock coordinates diagnostic clients against each other and governs no safety-critical resource. Decision. The lock registry (diagnostic lock registry (BB_0113), issue #149) is in-memory,
off the control path, and guards nothing SC. It is pure QM coordination
metadata: at most one live lock per Alternatives considered. (a) Back locks with an SC-managed resource handle so they actually arbitrate control access — rejected; that is exactly the QM→SC write Diagnostic write surface ga... (ADR_0119) forbids without the full gate. (b) Persist locks across restarts — deferred; in-memory is sufficient for diagnostic-session coordination and adds no durability surface. Consequences. ✅ locks ships as a strictly-QM v1 feature with no HARA impact. ✅ The extractable core is preserved. ❌ Locks provide no guarantee against a non-diagnostic actor (e.g. the SC process itself) — they coordinate diagnostic clients only, by design. |
Context. A gap analysis of the served surface against the captured
contract found breaks inside families already nominally implemented: the
global Decision. Land these as one cohesive parity pass (Global fault SSE stream (REQ_0961) –
Auth disable parity (REQ_0968)), all in Alternatives. (a) Treat the trigger surface as “good enough” globally —
rejected; a path-hardcoding client reaches triggers only at
Consequences. ✅ A read/coordination |
Context. The write/action families (operations, configurations,
bulk-data, scripts, updates, lifecycle) were Decision. Model the write side as a ports-&-adapters seam mirroring the
read Alternatives. (a) Wait for the per-family safety cases before any write code — rejected; it blocks the entire surface and its client integration on work that is months out. (b) Implement effects now behind a runtime check — rejected; that is the QM→SC write Diagnostic write surface ga... (ADR_0119) forbids without the full gate. Consequences. ✅ The write surface is shape-complete, wire-compatible, and
fully testable against the simulation today. ✅ The safety boundary is
preserved — the only backend performs no effect, and the gate has a defined
insertion point. ❌ A client cannot yet cause a real effect (by design). ❌
The advertised |
Context. After the read core, Tier-A parity, and the write plane, four
read thin spots remained: the deferred Decision. Reuse the established seams rather than invent new ones:
logs entries flow through the read Consequences. ✅ The SOVD read surface is contract-complete (only the
genuinely-out-of-scope families remain |
Architecture Decision: Compile-time build identity, captured in a leaf crate and injected as data ADR_0132
|
Context. The version catalogue reported only the crate semver
( Decision. Capture at compile time, not runtime. A leaf
Consequences. ✅ A running binary reports its exact commit, dirty state,
and build time; a field issue traces to source. ✅ The extractable core keeps
zero edge to |
Building block view¶
arc42 §5.
The diagnostics surface decomposes into four extractable core crates and two binding crates. Core crates depend only on each other and external crates; binding crates additionally depend on taktora runtime crates and on the provider seam.
Wire DTOs for the SOVD surface: the entity tree (Area / Component /
Function / App), the DTC/fault model (status sub-object, severity,
occurrence count, reporting sources), freeze-frame / snapshot environment
data, and the reusable collection envelope. |
The data-source seam: a |
Transport-neutral read-diagnostic core. A |
The HTTP surface: an axum router exposing the gateway’s read-core resolvers
over the ros2_medkit REST contract on the |
The live-push slice inside |
The mandatory Area/Component grouping manifest: a type-safe builder core and a
TOML loader (over a committed |
Sources liveness and timing from taktora-executor The lifecycle hooks ( |
Maps connector-framework The binding is a stateful |
The gateway’s authentication seam, inside |
The lock-registry slice inside |
A bounded (100-event) replay ring retained alongside the change-event
broadcast in |
The command-side seam, mirroring the read |
The surfaces that bring the read side to contract fidelity (Read-family completion — se... (ADR_0127)).
In |
A leaf |
Crate-level building blocks and their dependency edges (depender → dependee).
The graph is acyclic and the cut between core and binding crates is the
extraction seam: every edge crossing into
graph TD
axum[taktora-medkit-gateway-axum] --> gw[taktora-medkit-gateway]
axum --> manifest[taktora-medkit-manifest]
gw --> prov[taktora-medkit-provider]
gw --> model[taktora-medkit-model]
gw --> manifest
manifest --> model
prov --> model
be[taktora-medkit-binding-executor] --> prov
be --> exec[taktora-executor]
bc[taktora-medkit-binding-connector] --> prov
bc --> conn[taktora-connector-core]
|