Problem
Across the workflow skills (/cspec, /creview-spec, /ctdd, /cverify, /cupdate-arch, /cdocs, /cauto, /cprune, …) what gets surfaced back to the user at the end of each stage is largely left to model judgment — "surface whatever it thinks is useful." The result is an inconsistent, unpredictable experience: the same stage returns a terse line in one run and a wall of detail in another; important items (decisions made, things deferred, artifacts written, what to do next) are sometimes surfaced and sometimes silently omitted; and the structure differs skill-to-skill, so the user never builds a stable mental model of "what I get at this stage."
This is the shared root of several already-tracked failures — e.g. findings not persisted (#94), calibration telemetry silently dropped (#189), forked skills stalling after presenting proposals (#90). Each is an instance of the same gap: there is no declared contract for what a stage MUST return, persist, and surface, so an omission is a model judgment call rather than a detectable violation.
Proposal
Define an explicit output contract per skill/stage: a declared, ideally machine-checkable specification of what each stage must return to the user and must persist, with a uniform top-level structure shared across all skills. The model fills the content; the contract fixes the shape and the required-to-surface set.
A candidate uniform contract — every user-facing stage ends with these, in this order:
- Result — one line: the stage outcome (e.g.
pass / fail / blocked / N findings / advanced to <phase>).
- What changed / produced — the concrete deliverables of this stage.
- Needs your attention — decisions made autonomously, escalations, deferred items, anything requiring a human call. Always rendered (even if "none") — never omitted at model discretion. This is the load-bearing section: it's exactly what gets silently dropped today.
- Artifacts — paths to everything written/updated (report, findings, manifest, …).
- Next step — the single recommended next action.
Pair the human-readable rendering with a small machine-readable companion (structured fields / a fenced block) so a contract violation is mechanically detectable — the same way /cauto already emits an AUTONOMOUS_DECISIONS_START/END block and the verification report follows a fixed section layout. Generalize that one-off discipline into a framework-level contract that every skill declares and is checked against.
Why
Scope / rollout
Problem
Across the workflow skills (
/cspec,/creview-spec,/ctdd,/cverify,/cupdate-arch,/cdocs,/cauto,/cprune, …) what gets surfaced back to the user at the end of each stage is largely left to model judgment — "surface whatever it thinks is useful." The result is an inconsistent, unpredictable experience: the same stage returns a terse line in one run and a wall of detail in another; important items (decisions made, things deferred, artifacts written, what to do next) are sometimes surfaced and sometimes silently omitted; and the structure differs skill-to-skill, so the user never builds a stable mental model of "what I get at this stage."This is the shared root of several already-tracked failures — e.g. findings not persisted (#94), calibration telemetry silently dropped (#189), forked skills stalling after presenting proposals (#90). Each is an instance of the same gap: there is no declared contract for what a stage MUST return, persist, and surface, so an omission is a model judgment call rather than a detectable violation.
Proposal
Define an explicit output contract per skill/stage: a declared, ideally machine-checkable specification of what each stage must return to the user and must persist, with a uniform top-level structure shared across all skills. The model fills the content; the contract fixes the shape and the required-to-surface set.
A candidate uniform contract — every user-facing stage ends with these, in this order:
pass/fail/blocked/N findings/advanced to <phase>).Pair the human-readable rendering with a small machine-readable companion (structured fields / a fenced block) so a contract violation is mechanically detectable — the same way
/cautoalready emits anAUTONOMOUS_DECISIONS_START/ENDblock and the verification report follows a fixed section layout. Generalize that one-off discipline into a framework-level contract that every skill declares and is checked against.Why
Scope / rollout