AI does not fix a weak foundation. It amplifies it.
Bolted-on AI
Many teams seeking to leverage AI start by designing agents that bolt onto an existing process or system. We think we are making huge advances, and then we are surprised when the part we automated produces little meaningful change. Sometimes it goes the other way, and the effort even distracts energy away from fundamentals and degrades the system.
Coding agents are a good example. Coding agents (like Claude Code) make the act of writing code dramatically faster. In a controlled GitHub study, developers finished an isolated task, implementing an HTTP server, 55.8% faster with AI assistance.1 That gain is also nearly irrelevant at the level that matters.
Because writing code is a small slice of the job. Microsoft's 2024 time study put hands on the keyboard at roughly a tenth of an engineer's week; the rest is meetings, reviews, coordination, waiting, and rework.2 Speed up a tenth of the work and the arithmetic is unforgiving. Amdahl's Law (the rule that a system's speedup is capped by the part you don't speed up) says that making coding 90% faster lifts total throughput by only about 13%. Make coding instant and free and you still top out near 15%.3
That is why the field results keep disappointing. In a 2025 randomized trial, experienced developers were measured 19% slower with AI tools, while believing they were about 20% faster.4 The tool sped up the typing. The unchanged system around it (the reviews, the tests, the integration, the handoffs) absorbed the difference and then some.
| Measure | Result |
|---|---|
| Coding subtask, in isolation (GitHub RCT) | +55.8% faster |
| Whole-system ceiling when coding is ~13% of the work (Amdahl) | ~13% at a 90% speedup |
| Experienced developers, real repositories (METR, 2025) | −19% slower |
The lesson is not that the tool is weak. It is that we bolted a fast engine onto an unchanged system of work — and the system, not the engine, sets the ceiling.
The socio-technical foundation
The foundation is a combination of architecture, people, and process. Engineers have a name for that combination: a socio-technical system — a technical subsystem (tools, data, architecture) and a social subsystem (people, roles, the way work flows) that only produce value together.
The idea is older than software. In 1951, Eric Trist and Ken Bamforth at the Tavistock Institute studied British coal mines that had just mechanized. The new machinery, in their words, was technically optimized but not jointly optimized: it was dropped into an unchanged social system of work. Productivity never rose to match the mechanization, and absenteeism climbed to around 20%. The older, "primitive" hand-got method outperformed it — because it was organized around small, multi-skilled teams that shared the whole task.5 The technology improved. The outcome got worse. The variable was the system, not the machine.
That study founded socio-technical systems theory, and its central principle is joint optimization: you cannot optimize the whole by optimizing the technical and social subsystems separately. Push on one and ignore the other and you sub-optimize the system. Writing decades later for software engineers, Baxter and Sommerville put it plainly — the failures we label "technical" are usually symptoms of a deeper socio-complexity, the friction between people, process, and tools, not a fault in the tools themselves.6
This is the same trap I described with coding agents, now stated as a law — and it is exactly where AI research is converging. Surveying how to evaluate generative AI, Weidinger and colleagues argue that testing a model's capability in isolation is only the first of three layers; you also have to evaluate the human interaction at the point of use and the systemic impact on the structures the system is embedded in, because "context determines whether a given capability may cause harm."7 The capability is not the outcome. The context is.
The newest work on agentic AI is blunter still: an AI system's "behavior and impact are co-produced by algorithms, data, organizational practices, regulatory frameworks, and social norms," and integrating it "requires changes to institutional roles, accountability structures, and workplace norms, rather than the simple introduction of new tools."8 You do not get the gains by adding the agent. You get them by redesigning the socio-technical system it runs inside.
So the foundation has two interlocking subsystems, and AI amplifies both. The technical substrate is the data, the architecture, the identity model, the surfaces your software is built on. The social subsystem is the system of work: the SDLC, the org, the approvals, the handoffs your people run inside. Optimize one and starve the other, and you have done what the coal mines did in 1951.
This is not new, and it is not specific to AI. Kentaro Toyama spent years studying technology in international development and arrived at what he calls the Law of Amplification.
Law of Amplification: technology's primary effect is to amplify underlying human and institutional intent and capacity — it does not substitute for their absence. 9
AI is the most powerful amplifier we have ever built, and the law still holds. The same model produces opposite outcomes depending on the foundations beneath it. Bolt a capable model onto a stack of disconnected point solutions and it amplifies the disconnection: faster wrong answers, confident summaries over stale data, more surface area to mistrust. Put the same model on a coherent foundation — one source of truth, governed data, clean events — and it compounds. Same engine. Opposite result.
The model is not the variable. The foundation is.
Why a model amplifies its foundations instead of transcending it
It would be convenient if the model were a clean layer on top, sealed off from the mess below. It is not.
In Hidden Technical Debt in Machine Learning Systems, Sculley and colleagues at Google name the principle CACE: Changing Anything Changes Everything. A machine-learning system is entangled with the data and the systems around it; it inherits and magnifies their properties rather than standing apart from them.10 The model is bonded to its foundations.
Conway's Law adds the organizational version — the socio-technical principle in action, the social subsystem dictating the shape of the technical one. Systems mirror the communication structures of the organizations that build them.11 AI does not escape Conway's Law, it broadcasts it. An assistant built over fragmented domains can only see what the fragmentation chooses to expose, and it narrates the result in the smooth voice of a single system while quietly inheriting the seams of many. The user hears one confident answer and never sees the four disconnected sources it papered over. The shape of the work becomes the shape of the output.
We inherit the old system's limits without noticing
Here is the harder part. A system built around an older technology encodes that technology's limits as assumptions, and the assumptions outlive the technology. Because they are baked into the tools, the metrics, the approvals, and the org chart, we stop seeing them as choices. We optimize inside the inherited contours instead of questioning them — even though the contours are where the largest gains are hiding.
The industry's own data shows the trap. In 2024, DORA found that as AI adoption rose, software delivery got worse: a 25% increase in AI adoption was associated with an estimated 1.5% drop in throughput and a 7.2% drop in delivery stability — even as developers reported feeling more productive.12 Faster code, slower delivery. The tool was amplifying a system that had not changed.
A year later, with more teams adapting, DORA reached a conclusion it stated almost exactly the way I would: "AI's primary role in software development is that of an amplifier. It magnifies the strengths of high-performing organisations and the dysfunctions of struggling ones." And the punchline: "The greatest returns on AI investment come not from the tools themselves, but from a strategic focus on the underlying organizational system."13 Throughput finally turned positive, but delivery instability kept rising, because teams had adapted for speed while their systems had not yet evolved to absorb it.
We have seen this exact lag before. The electric motor reached the factory in the 1890s; the productivity gains did not arrive until the 1920s — roughly three decades later — because factories kept laying themselves out around the geometry of the old steam plant. The new power source was bolted into the old contour, and the payoff waited on someone willing to redraw the floor plan.14 I have written before about why we keep building the new inside the shape of the old. This is the same trap, now with delivery metrics attached.
Guiding policy: redraw the foundation
We judge an AI investment by the foundation it amplifies, the data and architecture it is built on, and the system of work it runs inside. We jointly optimize both subsystems, never one alone. If the foundation is weak, we fix the foundation first.
What to do instead
- Map the foundation before the feature. For any AI capability, name the data it reads, who owns that data, where the source of truth lives, and what it does when the data is wrong. This is the diagnosis step, applied to AI.
- Redesign the system, don't just automate it. When coding gets ten times faster, the bottleneck simply moves to review, testing, integration, and coordination. Rethink the system from first principles. Surface hidden assumptions.
- Separate truth from interpretation. Use deterministic systems for ledgers, rules, state, and permissions (the things that must be exactly right). Use the model for intent, language, orchestration, and explanation. Handing deterministic work to a probabilistic system is the most common and most expensive miss.
- Make data a context surface. Governed domains exposing canonical events beat a model scraping screens. The model can only amplify what it can actually see.
- Amplify one coherent loop, not ten disconnected screens. Pick a single end-to-end slice of work and make the foundation under it solid. A narrow, deep win compounds; ten shallow agents fragment.
- Instrument trust. Explainable evidence, evals, regression tracking across model versions. If you cannot measure success or show why the system did what it did, you may not be improving as much as you think.
Risks and signals
The risk is amplifying fragmentation faster than you can govern it. Watch for two signals.
The first is the perception gap: individuals feel dramatically faster while the system does not move. The METR developers were certain AI sped them up; the stopwatch disagreed. If your engineers love the tool but your lead time, throughput, and stability are flat, the tool is working and the system is eating the gains.
The second is confidence gap: provide answers over stale or conflicting data, and users will quietly re-check the work. Again, the system is eating the gains.
The amplifier is already here
The leverage is enormous, and it runs in both directions. That is the part teams underestimate. A weak foundation does not stay neutral under AI; it gets worse, faster, and with more conviction. A strong one compounds.
Everyone will have access to roughly the same models. It is whether the thing underneath is worth amplifying: the data and architecture your software stands on, and the system of work people stand in. Automate and you get a faster version of what you already are. Redraw the foundation and you change what is possible.
Footnotes
-
Sida Peng et al., "The Impact of AI on Developer Productivity: Evidence from GitHub Copilot" (2023) — an isolated task completed 55.8% faster. arxiv.org/abs/2302.06590 ↩
-
Microsoft Research, "Time Warp: The Gap Between Developers' Ideal vs. Actual Workweeks" (2024) — about 11% of the workweek is spent coding. Estimates range ~10–32% depending on how "coding" is defined; the conclusion below holds across that range. microsoft.com ↩
-
Gene M. Amdahl (1967). Overall speedup S = 1 / ((1 − p) + p/k), where p is the fraction of work sped up and k is that part's speedup. With p ≈ 0.13: a 90% coding speedup (k = 10) yields S ≈ 1.13; making coding free (k → ∞) yields S ≈ 1.15. ↩
-
METR, "Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity" (2025) — a 19% slowdown despite developers believing they were faster. arxiv.org/abs/2507.09089 ↩
-
Eric L. Trist & Ken W. Bamforth, "Some Social and Psychological Consequences of the Longwall Method of Coal-Getting," Human Relations 4(1) (1951) — the founding socio-technical study; "technically optimized but not jointly optimized." journals.sagepub.com ↩
-
Gordon Baxter & Ian Sommerville, "Socio-technical systems: From design methods to systems engineering," Interacting with Computers 23(1) (2011) — joint optimization in software systems. doi.org ↩
-
Laura Weidinger et al., "Sociotechnical Safety Evaluation of Generative AI Systems" (2023) — capability, human-interaction, and systemic layers; "context determines whether a given capability may cause harm." arxiv.org/abs/2310.11986 ↩
-
"Socio-technical Aspects of Agentic AI" (arXiv preprint, 2026). arxiv.org/abs/2601.06064 ↩
-
Kentaro Toyama, Geek Heresy: Rescuing Social Change from the Cult of Technology (PublicAffairs, 2015) — the Law of Amplification. geekheresy.org ↩
-
D. Sculley et al., "Hidden Technical Debt in Machine Learning Systems," NeurIPS (2015) — the CACE principle. papers.nips.cc ↩
-
Melvin E. Conway, "How Do Committees Invent?", Datamation (1968). melconway.com ↩
-
DORA, Accelerate State of DevOps 2024 — AI adoption associated with reduced delivery throughput and stability. dora.dev ↩
-
DORA, State of AI-assisted Software Development 2025 — "AI's primary role… is that of an amplifier." dora.dev/dora-report-2025 ↩
-
Paul A. David, "The Dynamo and the Computer: An Historical Perspective on the Modern Productivity Paradox" (1989). gwern.net ↩