Gloss Key Takeaways
  1. Code churn has doubled since AI coding tools went mainstream, copy-paste is up 48%, and AI-generated code introduces about 1.7x more issues than human-written code in production.
  2. After steady gains, AI coding model quality appears to have plateaued in 2025 and may be declining, even as adoption and the volume of AI-generated code continue to rise.
  3. Security and design quality are significantly worse in AI output: in Veracode testing, 62% of AI-generated solutions had flaws or known vulnerabilities and were 2.74x more likely to contain vulnerabilities than human code.
  4. Any speed gained in generating code is often lost to review, testing, debugging, and rewrites, with studies showing higher total costs and even longer task completion times for developers using AI tools.
  5. AI mistakes have shifted from obvious syntax problems to subtle conceptual and architectural errors that pass superficial checks but fail on edge cases or security fundamentals.

AI Code Is Getting Worse, Not Better

Developer facing code quality warnings

There's a number making the rounds that should bother anyone shipping software in 2026: code churn, the percentage of code thrown away within two weeks of being written, has doubled since AI coding tools became mainstream. Copy-pasted code is up 48%. And AI-generated code introduces 1.7x more issues than human-written code across production systems.

We traded one problem for a worse one. Writing code used to be the bottleneck. Now it's reviewing code that nobody fully understands.

The quality plateau

For two years, AI coding tools got steadily better. Models improved, context windows grew, suggestions became more relevant. Developers were genuinely more productive, at least by the metrics that are easy to measure.

Then the curve flattened. IEEE Spectrum reported that most core models reached a quality plateau over the course of 2025, and more recently seem to be in decline. The improvements stopped coming, but the adoption kept accelerating. More code is being generated by AI than ever before, and the quality of that code is no longer improving to match.

This matters because the early gains created trust. Developers got used to accepting suggestions. Review habits relaxed. The assumption that AI output was "good enough" became baked into workflows. And now that assumption is quietly breaking down.

The numbers are ugly

Veracode tested over 100 AI models on code generation tasks. 62% of AI-generated solutions contained design flaws or known security vulnerabilities. Across the board, AI-generated code was 2.74x more likely to contain vulnerabilities than human-written code.

The breakdown by vulnerability type is worse than the headline. AI code was 2.74x more likely to introduce cross-site scripting vulnerabilities. 1.91x more likely to create insecure object references. 1.88x more likely to implement improper password handling. In 86% of relevant code samples, AI tools failed to defend against basic XSS attacks.

These aren't obscure edge cases. These are OWASP top-10 vulnerabilities, the stuff that gets drilled into every security training program. The models know these vulnerabilities exist. They generate them anyway.

Code diff showing heavy churn and rewrites

Faster output, slower delivery

The productivity story has gotten complicated. Vendors still claim 50% faster development. But a comprehensive cost analysis found that first-year costs with AI coding tools run 12% higher when you account for the full picture: 9% code review overhead, 1.7x testing burden from increased defects, and doubled code churn requiring constant rewrites.

One randomized controlled study went further: developers using AI tools actually took 19% longer to complete tasks than developers without them. The speed of generating code was more than offset by the time spent reviewing, debugging, and rewriting it.

Cortex's 2026 Benchmark Report found that PRs per author increased 20% year over year. That's the productivity story the tools want to tell. But incidents per pull request also increased 23.5%, and change failure rates rose around 30%. More code is shipping. More of it is breaking.

The 66% of developers who say their top frustration is "AI solutions that are almost right, but not quite" are describing something specific. Code that compiles, passes lint, looks correct on first read, but has a subtle logic error or security flaw that takes longer to find than it would have taken to write the code by hand.

The conceptual failure shift

Early AI coding bugs were obvious. Syntax errors, wrong variable names, hallucinated function calls. You'd spot them instantly. The bugs have evolved.

The current generation of AI coding errors are conceptual failures, the kind a rushed junior developer makes under time pressure. The code works. It passes tests. It handles the happy path. But it misses an edge case that a more experienced developer would have caught, or it implements a pattern that's technically correct but architecturally wrong for the codebase it's going into.

This is harder to catch because it requires understanding intent, not just syntax. A code reviewer has to know what the code should be doing, not just what it is doing. When AI generates the code, the reviewer often doesn't have the mental model that a human author would have built while writing it. The code arrives fully formed but without the reasoning that produced it.

Security dashboard showing vulnerability alerts

The technical debt accumulation

75% of technology decision-makers expect to face moderate to severe technical debt from AI-accelerated development practices by end of 2026. That projection comes from multiple independent studies, and I think it's conservative.

The mechanism is straightforward. AI makes it easy to generate large volumes of code quickly. Teams ship faster. Codebases grow. But the code carries more defects, more copy-paste duplication, more security vulnerabilities. Each deployment adds a thin layer of debt that's invisible in the moment but compounds over time.

The teams I talk to describe a specific pattern: the first few months feel amazing. Shipping velocity jumps. Backlogs shrink. Then around month six, bugs start surfacing that are hard to trace. Refactoring becomes painful because nobody fully understands the AI-generated sections. New features break old ones in unexpected ways because the codebase has grown faster than anyone's understanding of it.

What the productive teams do differently

The developers and teams who actually benefit from AI coding tools, and they exist, share a few habits that distinguish them from the teams drowning in AI-generated debt.

They treat AI output as a first draft, never as finished code. Every suggestion gets the same scrutiny as a junior developer's pull request. They have strong test coverage that predates the AI tooling, so new code gets validated against existing behavior. They use AI for the genuinely tedious parts, boilerplate, test scaffolding, config files, and write the complex logic themselves.

Most importantly, they know when to turn it off. When the suggestion is "almost right but not quite," they stop accepting and start writing. The 19% slowdown in that randomized study? I'd bet it correlates with developers who accepted suggestions they should have rejected, then spent time unwinding the damage.

The uncomfortable conclusion

AI coding tools are not getting better at the rate that adoption is growing. The gap between how much we rely on them and how much we should trust them is widening.

This doesn't mean the tools are useless. It means they're tools, with specific strengths and specific failure modes, and we've collectively gotten sloppy about the failure modes. The vendors have no incentive to highlight the churn numbers or the vulnerability rates. They highlight acceptance rates and lines generated.

The correction will come from the teams that start measuring what matters: not how fast code gets written, but how long it survives in production without causing problems. By that metric, 2026 is not looking great.


Marco Kotrotsos writes about practical AI implementation at gloss.run and acdigest.substack.com.

Gloss What This Means For You

Treat AI-generated code as a draft, not an answer: tighten review standards, require threat-model thinking for anything user-facing, and add tests that target edge cases and OWASP top-10 failure modes. Watch your own metrics (churn, incident rate per PR, change failure rate) to see whether AI is actually improving delivery or just increasing output. If review and debugging time is climbing, restrict AI use to well-scoped tasks and insist on clear ownership and documentation for any generated code that ships.