
Million-token context windows changed everything about what's possible, but most teams are still building for 4K limits.

GPT-5.4 can handle a million tokens. But most application architectures were designed for 4K-32K contexts, and the jump to 1M doesn't just expand capacity, it breaks fundamental assumptions about how you build.

Claude Code treats prompt cache misses like server outages. The engineering behind that decision saves millions in API costs.

The frameworks and abstractions built twelve months ago are already getting in the way. The models got good enough that the middleware became the bottleneck.