Gloss Key Takeaways
  1. AI coding agents are dramatically increasing code output, but they don’t speed up the human review/approval/merge steps that actually determine shipping velocity.
  2. Pull request volume can double or more, shifting the bottleneck to a small set of senior engineers who already carried most review responsibility.
  3. Code review is largely evaluative and context-dependent, so today’s AI review tools only cover a minority of what good reviews require (style/obvious bugs vs. architectural and product judgment).
  4. Organizations are seeing predictable symptoms: growing PR queues, longer time-to-merge, declining review quality, senior-engineer burnout, and higher defect rates reaching production.
  5. The core problem isn’t AI-generated code itself—it’s the mismatch between accelerated creation and unchanged review capacity and decision-making bandwidth.

gloss-hero-infra.png

Your team adopted Cursor three months ago. Pull request volume doubled in the first six weeks. Your developers are shipping more code than ever, and somehow your release cadence hasn't changed. The backlog didn't shrink. It moved.

This is the story playing out across thousands of engineering organizations right now, and almost nobody is talking about it. AI coding agents, whether it's Cursor, Claude Code, GitHub Copilot, or Windsurf, have genuinely accelerated how fast developers produce code. The productivity gains are real. But production code doesn't ship when it's written. It ships when it's reviewed, approved, and merged. That step didn't get faster. It got harder.

The Math That Nobody Did

A senior developer using an AI coding agent can realistically produce 3-5x more code per day than they could twelve months ago. That's not marketing copy, that's what teams are reporting after sustained usage. Sourcegraph's 2024 developer survey found that 76% of developers using AI tools reported meaningful productivity gains in code generation.

Now multiply that across a team of eight engineers. Where you used to see 15-20 pull requests per week, you're now seeing 40-60. Each PR still needs human review. Each one still needs someone with enough context to evaluate whether the code is correct, whether it fits the architecture, whether it introduces subtle regressions that tests won't catch.

The people doing that review are the same three or four senior engineers who were already the bottleneck before AI showed up. They didn't get an AI assistant for code review. They got three times the workload.

Why Review Resists Automation

Writing code and reviewing code are fundamentally different cognitive tasks. Writing is generative. You're translating intent into implementation, and AI is remarkably good at that translation when the intent is clear. Reviewing is evaluative. You're asking whether this implementation is the right one given everything you know about the system, the team, the business constraints, and the deployment environment.

AI-assisted code review tools exist. GitHub's Copilot has review features. CodeRabbit, Codium, and others offer automated review. They catch style issues, flag obvious bugs, and sometimes spot security concerns. That's useful, but it's roughly 20% of what a good code review actually accomplishes.

The other 80% is judgment. Does this abstraction make sense given where the product is heading next quarter? Is this the right tradeoff between performance and readability for a codebase that three new hires will need to understand in six months? Will this data access pattern hold up when the customer base grows 10x?

No AI tool answers those questions reliably today. The context window isn't the problem. The judgment is.

The Organizational Symptoms

If you're an engineering manager or VP, you're probably already seeing the downstream effects even if you haven't connected them to AI adoption yet.

PR queue depth is growing

Average time-to-merge is creeping up. Developers open PRs faster than reviewers can process them. Some teams report 2-3 day review wait times where they used to see same-day turnaround.

Review quality is declining

When reviewers are overwhelmed, they skim. They approve PRs they would have caught issues in six months ago. The approve-and-hope pattern becomes the norm, not the exception.

Senior engineers are burned out on review

Your best architects are spending 60-70% of their time reviewing other people's AI-assisted output instead of designing systems. They're becoming review machines, and they're starting to resent it.

Bug escape rate is climbing

More code with the same (or less) review rigor means more defects reaching production. One fintech company I spoke with saw their post-deploy incident rate increase 40% in the quarter after widespread AI tool adoption, despite having more tests than ever.

What Actually Helps

The fix isn't to slow down AI-assisted coding. That ship sailed. The fix is to restructure how your team handles the review pipeline.

Shrink PR scope aggressively. If AI lets developers write more code faster, the answer isn't bigger PRs. It's smaller, more focused ones that are faster to review. Set hard limits. 200 lines of meaningful change per PR is a reasonable ceiling. AI can help break work into smaller increments just as easily as it can generate large ones.

Create review specialization. Stop treating code review as something everyone does equally. Designate review leads per area of the codebase. Give them protected time for review, not as an afterthought bolted onto their feature work.

Use AI for the 20% it's good at. Let automated tools handle style enforcement, test coverage checks, and basic security scanning. Remove that burden from human reviewers entirely so they can focus on architecture and correctness.

Measure review metrics explicitly. Track time-to-first-review, time-to-merge, review depth (comments per PR), and bug escape rate. If you're not measuring the review pipeline, you can't manage it.

Invest in documentation and architecture decision records. The more context that's written down, the less a reviewer needs to hold in their head. AI tools can actually help here, generating ADRs and updating architecture docs as part of the development workflow.

The Uncomfortable Truth

The AI coding revolution created a production asymmetry. We made one side of the pipeline dramatically faster without touching the other side. This isn't a novel problem in engineering. It's the theory of constraints applied to software delivery. Speeding up a non-bottleneck step just creates a bigger pile in front of the actual bottleneck.

Code review is that bottleneck now. It requires trust, context, and judgment, three things that take years to develop in engineers and that we haven't figured out how to replicate in models.

The organizations that figure out their review pipeline in 2026 will ship faster than the ones that just bought everyone Cursor licenses and called it a productivity strategy. The tool isn't the strategy. The workflow is.

Gloss What This Means For You

If your team is adopting AI coding tools, start measuring the whole delivery pipeline, not just code output—watch PR queue depth, time-to-merge, and incident rates. Protect senior engineers from becoming full-time reviewers by tightening PR scope, clarifying ownership, and setting explicit review standards so “approve-and-hope” doesn’t become normal. Treat AI review tools as helpers for the easy 20%, but plan for the hard 80% by investing in better context sharing (design notes, architecture guidelines) and making review capacity a first-class constraint.