Last week I wrote about the last 10%. About the difference between “looks done” and “is done.” Since then, I’ve been thinking about a follow-up question that I consider more dangerous than the original problem:
Who even notices that the 10% is missing?
The Verification Problem
With the AI-generated C compiler CCC, the optimization levels -O0 through -O3 produce byte-identical binaries. The flags exist, they are accepted, they don’t generate any error messages. Everything looks correct.
To realize that something is fundamentally wrong here, you need to know what compiler optimization is supposed to do. You need to know that -O2 is supposed to unroll loops, that -O3 performs aggressive inlining, and that the resulting binaries must differ significantly in size and structure. This knowledge doesn’t come from the documentation. It comes from years of working with compilers, from reading assembly output, from debugging optimization issues.
The problem isn't that AI writes faulty code. The problem is that you need experience to spot the errors, and that this very experience is currently at risk.
The OCaml PR from my last article illustrates the same pattern from a different angle. The copyright headers listed a real Jane Street developer as the author. Anyone unfamiliar with the OCaml ecosystem, who doesn’t know who Mark Shinwell is or what he’s working on, wouldn’t notice this. The community recognized the hallucination immediately. They had the context that the submitter lacked.
Reviewing expertise is invisible until it’s missing.
Automation Complacency
In aviation, there is a phenomenon that has been studied since the 1990s: automation complacency, the creeping erosion of human skills due to overconfidence in automated systems. Pilots who rely too heavily on autopilot gradually lose the ability to fly manually in critical situations. This isn’t because they’re getting worse. Skill requires practice, and practice requires opportunity.
The parallel to software development is disturbingly direct.
When a coding agent produces a function that “works” in seconds, the incentive to understand the code in detail diminishes. Why spend an hour wrestling with an algorithm when AI can write it in ten seconds? Why manually debug relocations when the agent generates the linker code? Why work through a specification when the model “knows” it?
The answer is uncomfortable: Because you can only verify what you understand. And you only understand what you’ve struggled with.
Experience Is Not a Status
There is a widespread notion that experience is something you eventually “have.” A plateau where you can rest. Thirty-five years of programming, so I can evaluate everything.
That’s wrong. Experience is an ongoing process. It doesn’t come from observing, but from doing. From failing at a race condition at two in the morning. From debugging a memory leak that only occurs under load. From the moment when a test passes but production still crashes, and you learn why.
Every layer of abstraction you put between yourself and the problem is one less layer of experience you gain. AI agents are the most powerful layer of abstraction our industry has ever seen. That is both their greatest strength and their greatest danger.
You can't shortcut experience, you can't buy it, and you can't prompt it. You can only gain it. And you have to keep gaining it to retain it.
What drives me here isn’t the question of whether I can still verify—I have 35 years of context. It’s the question of whether someone starting today will ever build that context. If the first 90% is done in seconds, when do you even learn to see the missing 10%?
What I Do in Practice
I’m not a machine-breaker. I use AI coding agents every day, and they make me more productive. But over the past few months, I’ve built structures to ensure that testing isn’t left out. Because I’ve seen the alternative.
Tests Before the First Prompt
Before I let an agent tackle a problem, I write tests. Not the agent. Me. This forces me to understand the problem before I delegate it. What should the function do? What are the edge cases? What must not happen under any circumstances?
That sounds obvious. But it’s amazing how many developers leave both the problem and the verification to the agent, and then wonder why the two fit together perfectly yet are still wrong. An agent that tests its own code is like a student grading its own exam.
The Agent Gets a Pipeline
Every agent in my workflow has a defined pipeline: Compile, Lint, Test. Not optional, cannot be skipped. The agent sees the errors in its own work and iterates. This catches the obvious problems: syntax errors, missing imports, broken types.
But a green pipeline is not proof of correctness. It is a necessary minimum, not a sufficient criterion. CCC’s decorative optimization steps would pass any pipeline. So would the hallucinated copyright headers.
Quality Gates as Scripts
For recurring issues I’ve noticed, I write explicit check scripts: Are there open TODOs in the code? Are errors swallowed instead of handled? Are there functions without tests? Are dependencies being imported that aren’t defined in the project? These are automatable heuristics. No substitute for understanding, but a safety net for the things you’ve overlooked before.
The QA Loop
The most time-consuming part of my workflow is a verification cycle: I have one agent review the work of another agent. The findings become tickets. The tickets are processed. And then the revisions are reviewed again.
This mirrors the process that has worked in good software teams for decades: Someone writes, someone else reviews, and the discussion in between is where understanding emerges. The difference that matters: In the QA loop, I am the person who reads the findings and decides whether they are relevant. The agent finds patterns. I evaluate them.
This assumes that I can evaluate. And that brings us back to the core of the problem.
The Silent Cycle
In software development, there is a cycle that is rarely spoken of:
You need experience to recognize errors. You need errors to gain experience. You need the opportunity to make errors.
When AI agents take over the opportunity to make mistakes, because they write the code before the human has fully grasped the problem, they disrupt this cycle. Not maliciously. Not through poor results. But through results that are too good, too fast, and too convenient, which remove the incentive to dig deeper.
In my professional circle, I observe a strange willingness to accept this disruption. Developers are automating away their own learning curve. Not because they underestimate their abilities, but because the short-term productivity gain is so compelling that the long-term costs remain abstract.
The last 10% doesn't get any easier. But if no one has the experience to recognize them anymore, they become invisible. And invisible problems are the most dangerous.
Craftsmanship Means Being Able to Verify
At my first company in Erfurt, in the mid-90s, there was a sentence on our website: “Software development today is a purely engineering feat. This is part of our craft.”
There, craftsmanship didn’t just mean being able to make something. It meant being able to assess the quality of what was made. A carpenter can tell from a joint whether it will hold. A mason can tell from a wall whether it is straight. A software craftsman reads code and knows whether it works. Not because he runs it, but because he understands it.
This ability is not a luxury. It is the foundation of everything that follows: maintenance, debugging, further development, security. And it arises only through what no AI can take away from you: the ongoing, often tedious, sometimes frustrating practice of doing it yourself.
90% complete is a good start. But here’s a warning: If we have 100 projects that are 90% complete, how much will the 100 times the missing 10% cost us?