"Just Ask Claude" Is Not a Strategy: Learning to Love Spec-Anchored Development

I've never been a "classic" developer. I learned to code by building — hands-on, inside real codebases. Which means I've always been a little intimidated by things like Test-Driven Development. I associated TDD with "real programmers" who "actually know what they're doing."

So, in the name of self-improvement, I made a lot of false starts at becoming a "real" TDD developer. Write the tests, verify them, then write code until they pass. It never stuck. So much of how I build is trying stuff out, seeing it, and iterating. TDD assumes you already know what "done" looks like, which is easy when you have hard requirements. But hard requirements don't exist when you're still figuring it out as you build it. When you're a product-minded engineer, a solo developer, or just building for yourself, you usually don't have the requirements until after you've built the thing and played with it.

When AI coding started getting good, the same question came up everywhere: how do you get these agents to ship code that actually works? Most of the answers were variations on TDD.

Test-Driven — Write tests first. Tell the AI to implement until they pass.
Spec-Driven — Write a spec. Tell the AI to implement against it.
Spec-Anchored — Write the spec first, then let it evolve as a living document as the app develops.

I started with TDD, because I thought oh finally, I can be a TDD developer without having to write the tests! And sure enough, the AI didn't do it any better than having it write tests after the fact. It kept the model on task, which helped. But it wasn't the leg up I was hoping for.

So I moved to spec-driven. This was better — I could sit down, write the high-level spec for what I was building (usually conversationally, with an AI), and hand it off to implement. I could even have it work TDD-style underneath, letting the tests pin down the spec.

And it works. Initially.

But as the codebase grows, something breaks down. There's so much less human-in-the-loop that you never build the granular, in-your-bones knowledge of the app you used to have pre-AI. What decisions got made? What changed? How would I even describe the current state of this thing to another human — without saying "well, you can try it in the app, or ask Claude"?

The fact is, AI has led to developers losing the mental model for how a change will affect the system. That's a massive liability, because when the AI model fails to correctly understand the change, the AI isn't the one being fired.

Then I learned about this middle approach called "Spec-Anchored" from this paper.

The specification spectrum: from Code-First to Spec-as-Source

Spec-anchored felt like the sweet spot. The catch: keeping the spec and the code in sync is real work, and I knew I'd need heavy AI assistance to pull it off.

My first attempt was on an existing codebase (my personal AI harness, very meta). I handed Claude a paper on the approach, a few articles, and asked it to build a skill for how I wanted my specs maintained:

Specs live in the repo, next to the code they describe, under /spec.
Track the core decisions, stories, and requirements.
Keep a log mapping each change back to the codebase itself.

This is the layout we agreed upon.

The /spec folder layout, where everything is addressable by a stable ID

Once the workflow was set, I had Claude (Opus 4.7) write a comprehensive spec capturing what the harness already did and how it worked.

Then I needed a way to add new specs that would actually get implemented the way I wanted. That became my Spec-Anchored Workflow: hand it a new user story, bug, or change over chat, and walk it all the way to a deployed staging artifact.

Chat → Issue → PM Agent → Approval → Coding Agent → Approval → Staging

The Spec-Anchored Change workflow in the AI harness This is a picture of the actual workflow within the AI harness that runs these changes.

The "Issue Opened" event trigger that fires the workflow off new issues Since it's event-based, it's triggered off actual issues being opened. I've also experimented with batch runs, which work just as well.

The PM Agent has exactly one job: take the ask from chat, make sure it makes sense and doesn't already exist, and write a spec. Then it hands that spec to me for approval. Once I approve, a coding agent picks it up on the same branch and actually implements it. I get a preview link, screenshots, and a video — and I approve again. On approval, it auto-merges, deploys to staging, and verifies itself there.

The first gate (the PM)'s job is to make sure what you're asking for actually makes sense. A good PM, when you ask for something, will inevitably ask you about the trade-offs of any change. "You want this feature, but that means this other feature gets more complicated. Is that okay?"

This is a big issue with AI coding, but also what makes it great: you ask for something, it does it. The problem is you have no idea what else that change will do or if it'll break something else you care about in the process. The spec is where I see the blast radius of a change before it happens. That's the part of spec-anchored that I find genuinely useful over longer time horizons.

The second gate (screenshots/video/validation) is, honestly, a crutch. If I were being as AI-centric as possible, I'd hand the result back to the PM to verify the coding agent did what the spec defined. And if the spec were perfect, that'd be enough. But the second gate keeps existing because seeing the real thing reminds me of something, makes me want to change something, or just lets me change my mind. I don't trust the AI to own that yet. It's also instant documentation for sharing changes with stakeholders.

Between these two gates, I have the ability to "see" changes extremely rapidly. Changes come in, I understand the scope, I get to stay in the pilot's seat and make the call on what we're building. Then I get to, relatively rapidly, see it actually working and how it actually looks and feels. And then only if I like it does it get merged into my staging environment.

Not only are ideas cheap to test, but they're no longer stomping on other features being delivered.

So I have all this running on an existing, brownfield codebase. And it just WORKED. All of a sudden, I was getting really well-built code that actually did what I wanted. Where it didn't, it was because I didn't know what I wanted. That's the root of so many issues, with or without AI. We expect AI to read between the lines instead of working with AI to help us build the context, ask the right questions, and understand the gaps in our own thinking.

Then I decided on starting a new project — let's see how this works greenfield. I copied over my skills, had my conversation with AI to build the PRD, and then fed that PRD into Claude Code to generate a spec from it.

Then I sat back and watched the AI implement it. The only tasks asked of me were to give it a database and access to my auth system — things that are human-gated. And on the other side, I had a fully working app.

I quickly had a bunch of change requests based on actually using the app: implementing the ability to switch accounts, adding the ability to upload photos, adding a thumbnail on the user profiles, etc. And amazingly, it just worked. The spec evolved, the app evolved.

This was the first time that I've really been able to say — hey, maybe this whole "software developers are out of a job" thing is real.

To be clear, it took significant work over months to develop the harness that made this possible. And I had previously spent significant time ensuring I had the right template and setup to make this possible. None of this is turnkey today (but it will be).

But boy. While we're certainly not at the point where you can't have a software developer involved somewhere in the process, we're closer than I expected to be when I started writing this.

Maybe a lot closer.

Get notified of new posts