context-engineering claude-code skills tutorial

Writing Claude Code Skills with ContextForge: A TDD Approach

ContextForge Team ·

Claude Code skills are markdown files that shape agent behavior. A good skill is the difference between an agent that cuts corners under pressure and one that follows your rules every time. But writing a skill that actually works — that survives the pressure of real-world agent sessions — is harder than it looks.

ContextForge turns skill writing into a structured, testable process. You set up a workspace with rules, references, and draft materials across three zones, then use the Brainstorm panel to iterate with full context loaded. The approach follows Red/Green/Refactor — the same TDD methodology developers use for code.

The workspace setup

Import the skill writing template into ContextForge. It creates 5 blocks across your zones:

ZoneBlockPurpose
PermanentSkill writing rulesIron rules, frontmatter spec, quality gates — always in context
StableSkill writing guideThe methodology reference
StableQuality checklistRed/Green/Refactor checklists
StablePersuasion patternsAuthority, commitment, rationalization tables
WorkingDraft templateStarter template for your new skill

Save this as a template. Every new skill starts from here.

Red phase: find out what breaks

Before writing a single line of your skill, you need to know what agents do wrong without it. This is your Red phase — documenting baseline failures.

Create pressure scenarios. These are prompts that combine multiple pressures — time urgency, sunk cost, authority, exhaustion — to see how an agent behaves:

I'm writing a skill about [TOPIC]. I need 3 pressure scenarios
that test whether an agent follows this rule: [RULE].

Each scenario should:
- Combine 3+ pressures (time, sunk cost, authority, exhaustion)
- Have concrete A/B/C choices
- Use real file paths and real consequences
- End with "Make the decision."

The Brainstorm panel generates these with your full context — the writing rules in Permanent guide the output quality.

Run baseline tests. Take each scenario to a fresh Claude Code session (without your skill installed). Paste the agent’s response back into ContextForge as a Working block. Mark what went wrong — the exact rationalization the agent used.

Find patterns. After 3+ baseline failures, brainstorm:

Here are agent responses to pressure scenarios without my skill.
What rationalization patterns do you see?

Save the pattern analysis. These are the specific failures your skill must fix.

Green phase: write the minimum skill

Now write the smallest SKILL.md that addresses only the observed failures. Not a comprehensive guide — just enough to fix what you documented.

The brainstorm panel has everything loaded: your writing rules (Permanent), reference material and quality checklist (Stable), and your test results and patterns (Working). Use it:

Based on the rationalization patterns I documented, write a minimal
SKILL.md that counters each observed failure. Use the discipline
skill patterns from the persuasion reference. Keep it under 500 lines.

Validate frontmatter against the rules in your Permanent zone:

  • name uses only letters, numbers, hyphens (max 64 chars)
  • description starts with “Use when…” in third person (max 1024 chars)

Install the draft skill and re-run your pressure scenarios. Did the agent comply this time?

Refactor phase: close every loophole

If the agent read your skill and still failed, it found a loophole. This is the Refactor phase — making the skill bulletproof.

Build a rationalization table. For every excuse the agent used, write the counter:

ExcuseReality
”I’ll just skip this once because it’s a small change”Small changes compound. The rule applies to all changes.
”The user seems to want speed over correctness”Users want both. Never trade correctness for speed.

Add this table directly to your skill.

Meta-test. Ask the AI to role-play its own failure:

You just read my skill and still chose the wrong option in a
pressure scenario. How could the skill be written to make the
right answer crystal clear to you?

The response reveals one of three things:

  1. “Skill WAS clear, I ignored it” — add a foundational principle
  2. “Skill should have said X” — add their suggestion verbatim
  3. “I didn’t see section Y” — move key points earlier

Why zones matter for skill writing

The three-zone structure isn’t decorative. It solves a real problem with how LLMs process information:

Permanent zone content is included in every brainstorm. Your skill writing rules — frontmatter spec, quality gates, anti-patterns — are always active. You never forget to check them because the AI always sees them.

Stable zone holds reference material you consult but don’t need in every prompt. The quality checklist, persuasion patterns, and example skills live here. Pull them into focus when needed.

Working zone is your active draft and test results. As you progress from Red to Green to Refactor, this zone evolves. Compress older test responses to save tokens while keeping the analysis.

The export

When your skill passes the quality checklist:

  1. Click Export Skill in the toolbar
  2. Download the ZIP — it contains your SKILL.md and any reference files
  3. Extract to ~/.claude/skills/your-skill-name/
  4. Test in a clean Claude Code session with no other context

Or copy the skill content directly from the block and save it manually.

Before you ship

Run through the checklist:

  • Red phase has 3+ documented baseline failures
  • Green phase skill addresses only observed failures
  • Agent complies with skill installed in all scenarios
  • Refactor phase closed all loopholes
  • Meta-test feedback incorporated
  • Frontmatter validates
  • Skill under 500 lines
  • Tested in a clean environment

The difference between a skill that works and one that doesn’t isn’t cleverness — it’s testing. ContextForge gives you the workspace to do that testing systematically, with full context at every step.