You Can’t Smell Bad AI Code
What if the tool saving you the most time is also creating problems you’re not equipped to find?
That’s not a hypothetical. That’s Tuesday.
For years, experienced engineers developed a sense for bad code. It had tells. Inconsistent naming. Shortcuts in the wrong places. Comments that explained what the code did instead of why. Logic that worked but felt like someone was in a hurry. You didn’t always have to trace the execution path. You could smell it.
I’ve been doing root cause analysis on broken systems for a long time. Customer after customer, stack after stack, across industries and architectures I didn’t design and wasn’t warned about. After enough of that, you stop reading code linearly. You start reading it the way a doctor reads a chart, not looking for what’s there, but for what shouldn’t be, and what’s conspicuously missing.
AI-generated code doesn’t give you either of those signals. And that changes everything about how you have to approach it.
When Clean Code Lies
Here’s a case I’ve seen play out more than once.
A developer asks an AI to write an API that calls three backend services at the same time and combines the results. The AI delivers something that looks like this, modern, parallel, efficient:
csharp
public async Task<OrderResult> GetOrderData(string id){ var customerTask = GetCustomer(id); var inventoryTask = GetInventory(id); var pricingTask = GetPricing(id); await Task.WhenAll(customerTask, inventoryTask, pricingTask); return new OrderResult { Customer = customerTask.Result, Inventory = inventoryTask.Result, Pricing = pricingTask.Result };}
Code review passes. Nobody flags anything. It looks exactly like well-written code should look.
Then it hits real traffic. CPU stays low. Memory is normal. No error messages anywhere. But requests start hanging. Response times climb from two hundred milliseconds to thirty seconds. Then keep climbing.
The investigation starts where it always starts. Is something deadlocked? Is the database slow? Is the network saturated? Nothing obvious appears. The code looks fine every time you look at it.
What the AI also generated, quietly, in every helper method throughout the codebase, was this:
csharp
public Task<Customer> GetCustomer(string id){ return Task.Run(() => { return _client.GetCustomerAsync(id).Result; });}
In plain terms: the AI wrapped every asynchronous call inside a pattern that blocks the thread while waiting for a result, and then wraps that in another layer that was supposed to be non-blocking. It’s the software equivalent of installing an emergency exit that opens into a wall.
A human developer might write that mistake once. The AI wrote it five hundred times, because it learned from thousands of code examples online, many of them mediocre, and reproduced the pattern at scale without understanding why it was dangerous.
Each call looked correct in isolation. Collectively, they starved the system of the resources it needed to process requests. The failure wasn’t a bug in one place. It was the same quiet mistake, everywhere, simultaneously.
That’s the distinction that matters: human mistakes are local. AI mistakes are fractal.
A Different Kind of Wrong
This is where the comparison to human-written code becomes important.
Bad human code is wrong in human ways. A developer under deadline pressure takes a shortcut. A junior engineer misunderstands an abstraction. Someone copies a pattern without fully grasping why it works in one context and fails in another. These mistakes leave traces. They have shape. Experienced reviewers have seen them before and know where to look.
AI code fails differently. It doesn’t cut corners because it doesn’t know what corners are. It optimizes for code that looks correct, reads cleanly, and passes tests. What it doesn’t account for is what happens at scale. What ten thousand simultaneous users do to a system. How its own error-recovery logic looks to a downstream service that’s already struggling. The gap between a function that works and a system that holds together under pressure.
The result is code that is technically defensible at every line and operationally dangerous as a whole.
What I’ve noticed, not because anyone announced it but because the nature of the cases changed before anyone started talking about it, is that a growing share of the systems I’m being asked to diagnose were built with significant AI assistance. Nobody leads with that. It usually surfaces mid-investigation, after I’ve already noted that the code is unusually tidy and the failure is unusually strange. The two things, it turns out, are related.
The Bottleneck Nobody Is Naming
The review heuristics built over decades were calibrated to human mistakes. Sloppy ones. Obvious ones. Ones that left fingerprints. AI doesn’t leave the same fingerprints. It fails in places that look clean on first read and second read, and only reveal themselves once you’ve ruled out everything else.
This is the bottleneck nobody is naming. AI made code generation fast. It didn’t make code judgment fast. If anything, it made judgment harder, because it removed the surface signals reviewers relied on to know where to focus their attention.
More code. Fewer tells. Same number of hours in the day.
So what does this mean in practice? It means code review can no longer be treated as a formality that fast-moving teams tolerate between sprints. It has to be treated as a discipline, one that requires understanding not just whether individual lines are correct, but whether the system they compose will hold together when reality shows up. It means asking questions AI doesn’t ask: What happens under load? What does failure look like from the outside? Where are the hidden assumptions?
The teams that adapt fastest won’t be the ones generating the most code. They’ll be the ones who recognized that the skill of reading code and the skill of writing it have never been more different than they are right now, and invested in both.st anything or take another pass at the opening?