Gherkin May Be the Most Consequential Language of the AI-Augmented Era

We’ve watched a four-step progression in barely three years.

Prompting. Prompt engineering. Context engineering. Intent engineering.

Each step was hailed as the destination. Each turned out to be a waypoint. And here’s the part nobody is saying out loud: we have named the practice of intent engineering, but we have never named the language.

That’s a problem. Because the practice doesn’t work without one.

I’m going to plant a flag.

Gherkin may be the most consequential language of the AI-augmented engineering era.

Not because it’s elegant. It isn’t. Not because it’s new. It’s been sitting on the shelf for nearly twenty years. But because it is the only widely-adopted notation that does three things at once. It expresses human intent in a form humans can read. It generates the tests that verify an agent’s output. And it survives the handoff between product, engineering, and AI agents without losing meaning.

Why this claim, and why now

When an agent can produce a working codebase in a weekend, the bottleneck is no longer code production. It’s two things: specifying intent precisely enough for an agent to execute it, and verifying that what came back is actually what you asked for.

Both are language problems.

Both are unsolved.

The industry’s response has been to invent better prompts, better context windows, better orchestration. All useful. None of them address the underlying issue: there is no shared, structured, executable language for expressing what we want and verifying what we got.

There is, however, a candidate. It’s structured natural language. It’s composable. It produces the tests that close the trust loop. And it’s already in your backlog under the name “acceptance criteria.” You’re just not using it the way the moment demands.

“See, I couldn’t walk the walk, couldn’t really talk the talk, Had to get my talk to properly explain my walk, Cause this lack in talk had my walk looking off.”

— Lupe Fiasco, Failure

The walk is what your team builds. The talk is how you specify what should be built. When the talk doesn’t match the walk, the walk looks off, and in the agent era the walk looks off at industrial scale.

Four pillars

Four pillars hold up the claim.

Code is cheap. Trust is expensive. When agents produce code in seconds, the scarce resource is justified confidence in what they produced. Gherkin closes the verification loop because the specification is the test. I’ve watched off-shoring outfits deliver precisely what the requirements document said and have it rejected anyway, because no one had specified what the customer actually wanted. I wrote about that failure mode in 2009. Agents have not solved it. They have industrialized it.

Craftsmanship matters more, not less. When an agent can sprawl a codebase in 48 hours, the disciplines that constrain complexity (domain-driven design, ubiquitous language, hexagonal architecture) become survival tools. Gherkin is ubiquitous language made executable. It is the user-facing edge of the craftsmanship stack.

Quality engineering is the new load-bearing discipline. TDD, ATDD, BDD, and the discipline of identifying edge, corner, and error cases used to live at the periphery of the team. They are now the disciplines that constrain agent output. The QE specialist is about to be repriced.

The hybrid engineer is the new 10x engineer. The person who can express intent precisely and verify agent output against that intent is the highest-leverage human on the team. The line between PM and engineer is dissolving.

Each pillar deserves its own article. None are throwaway claims. All point at the same load-bearing conclusion: in the AI-augmented engineering era, the specification is the artifact that holds up everything else.

What this series is, and isn’t

I owe you the limits of the claim.

Gherkin expresses behavioral intent. It is not a substitute for architectural diagrams, design documents, or implementation-level constraints. That’s a real boundary, and I’m naming it now so we don’t pretend otherwise.

But the boundary is smaller than most readers assume. Non-functional requirements (performance, availability, accessibility, security) are routinely treated as a separate category from behavior. They aren’t. A well-formed NFR is a behavioral expectation with measurable thresholds:

Given the system has 10,000 active recordsWhen a user searches for a recordThen the results appear within 2 secondsThat isn’t a quality goal floating in a Confluence page. It’s a passing or failing test. The NFRs that resist this treatment, like internal memory ceilings or specific implementation constraints, are real. They’re also a much smaller residue than people think.

So here’s the wager the series will defend. In the AI-augmented engineering era, behavioral intent is the load-bearing form of intent, because it is the form that closes the trust loop between a human, a specification, and an agent’s output. Architecture still matters. Pure implementation constraints still matter. But neither produces the executable verification that tells you the agent did what was asked.

That is what Gherkin does. That is why it’s the candidate.

What’s coming

Thirteen articles. One argument.

The Thesis (this article)
Defining AI-Augmented Engineering
The Progression Nobody Finished: From Prompting to Intent Engineering
Code Is Cheap. Trust Is Expensive.
Why Software Craftsmanship Matters More Now, Not Less
The Hidden Costs of AI-Generated Code Proliferation
Less Is More: The New Engineering Virtue
Platform and Language Selection in the Age of Agents
Quality Engineering Is the New Load-Bearing Discipline
Product Management as a Force Multiplier
The Hybrid Engineer: Why PM Skills Are the New Senior Engineering Skill
Stop Writing Acceptance Criteria You Don’t Convert
Gherkin Delivers: Closing the Loop on Intent-Engineered Software

If your team is already writing acceptance criteria in user stories without converting them into executable specifications before development begins, you are leaving the most important leverage point of this era on the table. Article 12 is for you. So is the capstone.

I’ve been writing code since I was twelve. Professionally since 1991. Five technology cycles: PC, client-server, web 1.0, cloud, and whatever we end up calling the era we’re in now. I have never been more convinced that the answer to the era’s hardest engineering question has been sitting in plain sight, in a notation most teams treat as “the QA team’s problem.”

The team that figures this out is going to outbuild the team that doesn’t.

Further reading: ”Architectural Fidelity with Globally Distributed Software Development Teams.” Walter Pinson, 2009.