Experienced software developers walked into 2025 convinced AI would buy them back time. Not minutes. Meaningful chunks of the day. In a controlled field experiment run by Model Evaluation and Threat Research, the math cut the other way. Tasks took 19% longer when AI tools were allowed, even as developers predicted a 24% speedup and walked away feeling 20% faster. Confidence rose. The clock did not cooperate.
This was not a lab demo or a benchmark flex. 16 experienced open-source developers worked inside their own mature codebases, the kind with memory, scars, and strong opinions baked into every directory. Average familiarity ran five years deep. Repositories cleared 1M lines of code and averaged 22K plus stars. Real bugs. Real features. Real refactors. Half the tasks ran without assistance. Half ran with Cursor Pro wired to Claude 3.5 and 3.7 Sonnet. Same developers. Same work. A very different finish.
Joel Becker, Nate Rush, Elizabeth Barnes, and David Rein designed the study to leave little room for hand waving. Within subject randomization. Screen recordings. Clustered standard errors. The slowdown was not about intelligence or novelty. It was friction. The models could write code, but they could not read the room. Architecture, conventions, and unwritten rules slowed everything down. Developers spent time correcting tone and intent, not syntax.
Nate Rush explained it cleanly. Even when output looked impressive, it demanded cleanup before it belonged. Philipp Burckhardt felt the gap firsthand. He wanted to believe the tools helped. Belief, it turns out, does not compile. Anders Humlum’s labor market research shows similar results at scale, with productivity gains landing closer to 3%. Aneesh Raman sees AI pressing hardest at entry level roles. Daron Acemoglu has warned that automation without redesign wastes energy.
The takeaway is not that AI failed. 69% of participants kept using it anyway. The value is real, just mispriced. Speed is not productivity. Perception is not measurement. Tools do not replace judgment in systems shaped by years of tradeoffs and midnight fixes.


