Intentional Deliberate Engineering · Essay

Encoding the Senior Engineer in the Room: A Design Memo for Tacit Skills

Why a generic AI assistant gives generic answers — and what you can do about it. The five Tacit skills are an attempt to encode the questioning discipline of a specific kind of senior engineer into one slash command per situation.

A senior engineer makes a particular class of decision alone. Architecture review before the architecture review meeting. Postmortem when no one has time to run one. Decision memo when you cannot justify an offsite but you do need a document that survives one.

The default tool for this is a blank chat window. The default output is generic-helpful: have you considered the tradeoffs? what are your constraints? have you explored alternatives? That output is not wrong. It is unspecific — which is the same thing, in this context.

Tacit Skills is five Claude Code slash commands that try to fix this. Each command is a composite persona — someone you would actually want in the room — encoded as a structured prompt with three constraints: questioning discipline, output format, and grounded references.

Five personas, one router, one lonely engineer

Each skill encodes a different professional voice. /tacit routes the situation to the right one.

This is a memo on what it took to make them work, and where they still fall short.

The failure mode of generic prompts

The first version of /scrutiny was a single prompt: “Act as a paranoid staff engineer reviewing my architecture. Find failure modes.” The model produced a long list immediately. Most items were generic (“what about caching?”). A few asserted behaviors my system did not have. Maybe one was actually useful for the design in front of me.

The fix was not better wording. The fix was structure:

  1. One question per turn. The skill cannot produce its review until I have answered three to seven questions, one at a time, with acknowledgment between each.
  2. Adaptive middle question. The third question branches on what I said in the second. If I described a write-heavy workload, it asks about backpressure. If I described a read-heavy workload, it asks about staleness tolerance.
  3. Grounded references over hedges. Before naming any failure mode, the skill is required to quote or paraphrase the specific detail I gave it. If it cannot ground a claim, it has to say I do not know. Hedge words still appear in output, but only attached to a grounded reference — never as a substitute for one.

The third constraint is the one that does the work. It forces grounding. The model is no longer hallucinating failure modes — it is restating my own design back to me with the failure mode highlighted, which is what a senior engineer in the room would do.

QUESTIONING DISCIPLINE

Five personas, five questioning patterns

Each skill encodes a different professional voice:

SkillPersonaQuestioning pattern
/scrutinyThe Reviewer (paranoid staff+ engineer)Failure modes, scaling cliffs, security gaps, ranked by P0/P1/P2
/verdictThe Chief of StaffDecision memo: options, criteria with explicit weights, recommendation, the tradeoff being accepted
/autopsyThe InvestigatorPostmortem: timeline with signals, root cause as a mechanism not a story, action items that ship
/fractureThe InterrogatorStress-test a spec for unstated assumptions, contradictions, hidden dependencies
/tacitThe Triage OperatorRouter — describe your situation, get the right skill

The personas are not interchangeable. /scrutiny is good at finding things; /verdict is good at choosing among them; /autopsy is good at explaining what happened. Asking /scrutiny to write your decision memo produces a list of risks with no recommendation. That is a feature.

The router (/tacit) exists because users do not always know which one they need. Describe the situation in plain English; the router proposes a plan and runs the right skill.

The output discipline

Every skill produces a specific markdown structure. Tables for tradeoff analysis. Severity ratings (P0, P1, P2) with explicit definitions. Action items with owners and deadlines, never just “improve monitoring.” A 500-1200 word ceiling, enforced — discipline of the brief.

The output discipline is what makes these forwardable. The whole point is that the engineer who runs the skill can paste the result into a doc and send it to their VP without having to massage it. If the model produces prose dumps, the engineer becomes the editor, and the skill has failed.

Where this still falls short

I will not pretend this is solved.

Persona consistency degrades on long sessions. After a few back-and-forths the questioning drifts back toward generic. I have a re-anchoring step in the skill but it is imperfect.

Eval was not blinded. I tuned each skill against my own examples. A real eval suite would have a held-out set of decisions with known outcomes and would score the skill’s recommendations against those. I have not built that. Until I do, my quality claims are judgment, not measurement.

Output quality is bounded by the model. The skills were tuned against Claude Opus. They run on other models — different families exhibit different drift patterns under the same constraints. The persona discipline holds across them; specific output quality varies.

No skill replaces the senior engineer in the room. They replace you, sitting alone in a hotel room at 11pm trying to draft a memo before a 9am meeting. That is a smaller and more honest claim.

What I learned

The work was not in the prompts. The work was in the constraints.

The questioning discipline (one at a time, adaptive, grounded) is what produces specificity. The output template is what makes it forwardable. The persona is what makes it sound like a person rather than a tool. None of these came from “make the prompt longer.” All of them came from removing the model’s degrees of freedom.

This is the inverse of how most prompt-engineering content reads. The advice tends to be: give the model more context, more examples, more flexibility. My experience is the opposite — for a tool that has to produce a specific artifact in a specific situation, every degree of freedom you remove improves the output.

The five skills, install one-liner, and adaptation guides for Codex/OpenCode are at github.com/ketankhairnar/tacit-skills. Issues and PRs welcome — especially with examples where a skill was wrong.