Parse the spec as claims: the CI drift guard behind an agent-readable design system

In April I published Supergraphic Panel, the design system behind the chat-arch viewer, as three public URLs you can point a coding agent at. The pitch was: paste a snippet into CLAUDE.md, tell the agent to read spec.md in full, and get the system applied verbatim. (The visual language itself is an LCARS homage; credit to Michael Okuda, as the spec’s attribution section spells out. The transferable part is the methodology, not the amber.)

That pitch has a failure mode I didn’t write about at the time: the spec is prose, the shipped CSS is code, and nothing in the universe keeps prose and code in agreement. This post is about the thing that does: a two-stage drift guard that runs in CI, what each stage catches, the contrast bug that earned its own token, and the bug the guard sailed right past because it couldn’t read it.

Design docs lie

Every design doc drifts. Someone tunes a hex in the stylesheet during a contrast fix and doesn’t touch the doc. Someone rewrites the doc from memory and transposes two values. None of this is malice; it’s entropy, and the doc loses because the doc doesn’t have a compiler.

This used to be a tolerable kind of lying, because the only readers were humans, and humans read documentation with a healthy reflex: assume it’s approximately true, trust the code when they disagree.

A coding agent doesn’t have that reflex unless you build it one. The whole point of the agent workflow is the opposite instruction: read the spec in full before writing any code, use the palette and component patterns verbatim. Verbatim is the word doing the damage. A stale hex in a human-read doc costs you a confused human; a stale hex in an agent-read doc costs you a confidently wrong agent, faithfully propagating the drift into every project that adopts the system. The doc stopped being documentation and became an API. APIs get contract tests.

One source of truth, two generated views

The setup in chat-arch has three artifacts with a strict ordering:

styles.css: the .lcars-root custom-property block. This is where values live. It ships to browsers, so it cannot drift from production by definition.
tokens.json: DTCG 2025.10 format, generated from the CSS by a script that parses the --lcars-* declarations and merges in prescriptive scales (radius, shadow, duration, type sizes) that don’t exist as single CSS variables. Machine-readable, never hand-edited.
spec.md: the canonical prose. It owns everything the other two can’t express: what each color is for (“ice is for data, decorative use forbidden”), the what-not-to-do list, the component patterns. It quotes values, and every quoted value is a liability.

The drift guard runs at the end of the token generator, which runs in pnpm build, which runs in CI. Drift doesn’t get a grace period; it fails the build the same way a type error would.

Stage 1: every hex must exist

The first check is blunt: scan spec.md for every hex literal, and require each one to appear somewhere in the serialized tokens.json. If the spec cites #ffcc9a and the tokens only know #ffcc99, the build dies with a message that names the fix direction: the prose drifted, styles.css is the source of truth, fix the spec.

Blunt checks have blunt failure modes, and this one’s was the regex. A naive pattern like #[0-9a-fA-F]{3,8} happily matches GitHub issue references (#12345 is five hex-looking digits), which turns ordinary prose edits (“fixed in #1234”) into false-positive build breaks. The hardened version accepts only CSS-valid lengths, longest alternative first, with a word boundary:

const validHex = /#(?:[0-9a-fA-F]{8}|[0-9a-fA-F]{6}|[0-9a-fA-F]{4}|[0-9a-fA-F]{3})\b/g;

Against #12345, the regex tries the 4-digit branch, matches #1234, and then the \b fails on the trailing 5. No match, no false alarm. A drift guard that cries wolf on innocent edits gets deleted within a month, so the boring hardening is load-bearing.

Stage 2: the spec is a pile of claims

Here’s the bug class stage 1 cannot see. Suppose an edit swaps two rows of the palette table, so the spec now says sunflower is #dd9944 and butterscotch is #ffcc99, exactly backwards. Both hexes still exist in tokens.json, just under the other name. The substring check passes. The build is green. And the next agent that reads the spec paints body text in the chrome color, exactly as instructed.

The fix is to stop treating the spec as text and start treating it as a set of claims. The palette table already has a strict shape. Every row is “token X has value Y”:

| `color.sunflower` | `#ffcc99` | Primary text, titles ... |

So stage 2 parses those rows as assertions and resolves each one against the real token tree (trimmed from generate-tokens.mjs; the real version accumulates errors instead of failing fast):

// "| `token.path` | `value` | ..." is a claim that the named
// token has the given value. Resolve it and check.
const claimPattern = /^\|\s*`([\w.-]+)`\s*\|\s*`([^`]+)`\s*\|/gm;

for (const m of spec.matchAll(claimPattern)) {
  const [, tokenPath, claimedValue] = m;
  const actual = resolve(tokenPath); // walk tokens.json by dots
  if (actual === undefined) {
    fail(`spec claims '${tokenPath}' exists; tokens.json has no such token`);
  } else if (normalize(actual) !== normalize(claimedValue)) {
    fail(`spec claims ${tokenPath} = '${claimedValue}', ` +
         `but tokens.json has '${actual}'. Semantic swap. Fix the spec.`);
  }
}

Now the sunflower↔butterscotch swap fails loudly, with both the claimed and actual values in the error. The claim grammar is deliberately narrow: only rows whose first two cells are backtick-wrapped count. The WCAG contrast matrix and the typography table use plain cells, because they describe relationships between tokens rather than token values, and the guard ignores them by design. That convention (backticks mean “this is a checkable claim”) is the actual contract between the doc and the build.

The bug that earned its own token

One palette entry exists purely because of a bug, and it’s the best argument for why these values deserve a guard.

Inactive sidebar items originally rendered as butterscotch at opacity: 0.55. Opacity is a runtime blend: the effective color is whatever sits behind the element, mixed in. When the backdrop behind the sidebar shifted, black-text-on-faded-amber contrast fell to 3.1:1, below even WCAG AA’s 4.5:1 floor for body text, on a navigation element.

The fix was to pre-composite by hand: compute sunflower at 85% over black once, and mint the result (#d9ad82) as a solid token, color.sunflower-muted. Black text on it now holds 10.2:1, comfortably past AAA, regardless of what the backdrop does, because solid colors don’t negotiate with their surroundings. The spec states the rule for replicators directly: pre-composite by hand rather than relying on opacity whenever legibility is load-bearing.

And because the fix is a token with a claim row in the palette table, the guard now defends it. If anyone “simplifies” #d9ad82 away or fat-fingers it in the spec, the build says so before an agent can read the lie.

The one that got away

Now the confession. The guard checks colors: hex literals and palette claims. That is all it checks.

In May 2026 the chrome geometry got rebuilt: v1 rendered the sidebar’s signature corner as two rectangular blocks butted at a square inner corner; v2 replaced that with a single inline SVG L-shape with a concave quarter-circle inner radius (an Elbow.tsx component). A real structural change to the system’s most recognizable shape.

The spec kept describing the v1 geometry. The build stayed green the whole time, because no stage of the guard parses geometry prose. There’s no hex in “two rectangular blocks butted at a square corner.” The drift sat there until a human re-read caught it, and the fix was a hand-written amendment note in the spec.

The lesson generalizes: a drift guard only protects what it parses. Its coverage is exactly its claim grammar, and it is perfectly, silently confident about everything outside that grammar, which is the same failure mode the guard was built to prevent, one level up. The honest responses are to grow the grammar (geometry claims could be tabled too: component, file, shape invariant) or, at minimum, to write down which sections of the spec are unguarded so you know where human re-reads still carry the load. Pretending the green build covers the whole document is how you ship a spec that lies about its own elbow.

What the agent workflow looks like

The consumption side is simple and it works: chat-arch.dev publishes an llms.txt discovery index that links the full spec, and the workflow is “read spec.md in full before writing any code, apply the system verbatim”, plus naming the known miss-risks (token scoping, the inline CSS-variable pattern, the asymmetric radii) directly in the prompt. The earlier post has the exact snippets and three ready-to-run recipes.

The point here is what that workflow implies. Agents really do apply quoted values verbatim. That’s what makes the three-URL distribution work, and it’s exactly what makes an unguarded spec dangerous. The same property is the superpower and the threat surface. If you want the first without the second, the doc needs a build step.

Apply this to your design system in an afternoon

None of this is LCARS-specific. The checklist:

Pick one source of truth for values. A CSS custom-property block that actually ships is ideal: it can’t drift from production. Never hand-maintain two value lists.
Generate the machine-readable tokens from it. DTCG JSON, a script, no hand edits. The generator is also where the guard lives.
Run it in the build, not as a side script. A drift check you remember to run is a drift check you’ll stop running.
Stage 1 (existence): every literal value quoted in your prose must exist in the generated tokens. Constrain the regex to valid forms with word boundaries, or issue references and ordinary prose will break your build.
Stage 2 (claims): pick one table convention as your claim grammar (backtick-wrapped token.path | value rows), resolve each path against the token tree, and exact-match values after normalization. This is the stage that catches swapped names, which existence checks structurally cannot.
Make errors name the fix direction. “Fix spec.md: styles.css is the source of truth” turns a red build into a thirty-second fix instead of an investigation.
Let the guard no-op when the doc doesn’t exist yet, so it can land before the spec does.
Write down what the guard does NOT parse. Geometry, motion, layout prose: anything outside the claim grammar is unprotected, and the green build will not tell you. That list is where human review still owes the system its attention.

The spec is a pile of claims. Some of them your build can check today. Check those, and be honest with yourself about the rest.

The guard described here is design-system/scripts/generate-tokens.mjs in chat-arch; the contrast ratios (3.1:1 before, 10.2:1 after) are from the spec’s WCAG matrix, computed against the shipped values. Supergraphic Panel’s visual language is an unaffiliated homage to Michael Okuda’s LCARS. Full attribution posture in spec §10.