- Jorge Cárdenas

- Dec 22, 2025
- 7 min read
Artificial intelligence will almost certainly drive more computing but its impact on electricity demand is far harder to pin down than most forecasts assume. The decisive uncertainty is not how fast AI adoption grows, but how quickly efficiency improves in models, systems, and hardware, reshaping how much power is ultimately needed to deliver AI at scale.
A familiar intuition and where it goes wrong
Electricity forecasters are used to stable relationships. When economies grow, power demand usually rises. The exact path varies with weather, prices, and policy, but the direction is predictable. Artificial intelligence appears to fit the same pattern. More AI use should mean more computing. More computing should mean more electricity.
The problem is that this relationship is not fixed. The link between how much AI is used and how much power data centres draw keeps changing. This does not come mainly from uncertainty about adoption. On that front, the direction is clear. Enterprise AI adoption rose from roughly 55% of organisations to nearly 80% in a single year, according to Stanford’s AI Index (Stanford HAI, 2024). Deloitte projects that inference, i.e. the day-to-day use of AI models, will account for roughly two-thirds of total AI compute by 2026, up from about one-third only a few years ago (Deloitte, 2024).
Even analysts who disagree on how steep the curve will be tend to agree on its direction: more AI, used more often, in more places. The deeper challenge lies elsewhere. AI systems are becoming far more efficient, and not in a smooth or predictable way. That makes the mapping from AI growth to electricity demand uncertain in a deeper sense than simple forecasting error.
A simple framework for a complex system
A clearer way to think about AI’s electricity footprint is to break it into five components.

Figure 1: The five components linking AI growth to electricity demand
At a system level, AI electricity demand depends on:
Workload volume: how much AI is used
Compute per unit of work: how much calculation each task requires, aka how hard the task is
System efficiency: how effectively installed hardware is used
Energy per unit of compute: how efficient hardware and software are
Facility overhead: extra power for cooling and electricity delivery
None of these elements is new or unknown, although it is the interaction among them that makes forecasting difficult. What is new is how quickly several of them are shifting at the same time and how strongly they multiply one another. Small differences in assumptions at any stage can scale to the entire outcome.
The first three components determine total compute requirements. The last two turn that compute into electricity demand. Seen this way, the forecasting challenge becomes clearer: even if you feel confident about adoption, you still must guess how the other four terms evolve. Below a deep dive of the 5 components.
1: Workload volume: growth is certain, shape is not
AI usage is rising across businesses, consumer products, and devices. That much is clear.
What is far less clear is what “one use of AI” actually looks like. A short text reply costs one amount of computation. An interaction that also processes images, audio, or long documents costs more. A simple question-and-answer costs one thing; an AI “agent” that quietly performs multiple steps, like searching, planning, checking, revising, costs much more.
Even design choices that users never see can matter. Systems designed to respond instantly tend to use more power per request. Systems that allow short delays can quietly process many requests together and run more efficiently. Where the work happens also matters: some AI tasks increasingly run directly on phones and laptops rather than in large data centres, shifting electricity demand away from the grid entirely (Google, 2024; Apple, 2024).
So workload volume will grow, but the route it takes, weather it will be burstier, more embedded, more agentic, more on-device, matters as much as the headline adoption rate.
2: Compute per unit of work: a key uncertainty
Compute per unit of work means how much calculation is required to deliver a useful AI result.
Two systems can produce answers that feel equally good to a user while doing very different amounts of internal work. This is where the mapping becomes unstable. One may rely on a large, general model for every task. Another may route easy tasks to smaller models, reuse previous results, or rely on better algorithms that avoid unnecessary computation.
A clear signal of these gains is cost. Stanford’s AI Index reports that the cost of delivering GPT-3.5-level performance fell by more than 280× between late 2022 and late 2024, driven by a combination of smaller models, better system design, and hardware progress (Stanford HAI, 2025).
Public API prices compress many efficiency gains into a single number: hardware improvements, algorithmic advances, architectural choices, and operational efficiency.

Figure 2: Cost per output token across major AI providers, including all models (reasoning, non reasoning, multiple context windows)
Figure 2 illustrates why this term is so difficult to forecast. While the overall trend in cost per output token declines over time (keeping output quality constant), the dispersion at any given moment is wide. Two systems delivering similar user-visible results can require orders of magnitude different computation.
That scatter doesn’t mean efficiency stopped improving. It means providers are using efficiency gains to deliver better capability (smarter answers, longer context, multimodal features, tool use, agent-like behaviour) instead of simply cutting the per-token price. It also got harder to compare tokens directly, because some reasoning products do hide extra computation at inference time (sometimes exposed as “reasoning tokens” or “thinking” levels), so an “output token” can represent more work than it used to. The right way to read the chart, then, is not by looking for one average trendline, but by watching the range: the cheapest options can keep getting cheaper while the premium ceiling rises as advanced reasoning models are priced for value and scarce capacity.
This spread is not noise. It reflects real strategic choices about model design, routing, reasoning depth, system architecture, and utilisation. Forecasts that extrapolate a single efficiency trendline assume away the very mechanism doing most of the work.
3: System efficiency: how busy the machines are
Electricity use depends not just on how much hardware is installed, but on how well it is used. Machines that sit idle or are poorly coordinated still draw power without delivering useful output. Systems that organise work efficiently can deliver the same results with far less electricity.
This is largely an operational issue, not a technological one. NVIDIA has shown that modest changes in how chips are run can deliver double-digit energy savings with minimal performance impact (NVIDIA, 2024). Google has described dynamically shifting AI workloads to reduce idle hardware, cutting wasted power without changing demand (Google Cloud, 2024).
For planners, the timing mismatch matters. Operational efficiency can change in months; grid infrastructure takes years to build. Two data centres with the same equipment can therefore draw very different amounts of power.
4: Energy per unit of compute: hardware progress in practice
When people talk about “AI’s energy appetite”, they often picture today’s hardware scaled up. But energy per compute is not static. It improves with each generation of accelerators, with better memory and interconnect design, and with software that extracts more useful work per watt. Hardware efficiency continues to improve with each generation of AI chips. Performance per watt rises, memory systems become more efficient, and software extracts more useful work from the same silicon.}
A concrete example comes from Google’s AI accelerators. Public disclosures show that newer generations of Tensor Processing Units deliver multiple-fold improvements in performance per watt compared with earlier designs (Google, 2023). Similar gains are reported across NVIDIA’s accelerator roadmap (NVIDIA, 2024).
These improvements compound over time. Assuming that today’s energy per unit of compute will still apply several years from now has little historical support.
5: Facility overhead: important, but bounded
Data centres also consume power for cooling and electricity delivery, commonly measured by power-usage effectiveness (PUE). According to the Uptime Institute, the global average PUE is around 1.54, while best-in-class hyperscale operators report values close to 1.1 (Uptime Institute, 2024). Moving from average to best-in-class can materially reduce total energy use for the same computing load.
These gains matter but they are bounded. Buildings can only be cooled and powered so efficiently. Compared with the large swings seen in compute per task or hardware efficiency, facility overhead is rarely the decisive factor.
Rethinking what we forecast
None of this means that electricity planners should throw up their hands. It suggests they should change the object they are forecasting.
Instead of treating AI demand as a single trajectory, it should be treated as a distribution shaped largely by efficiency paths. The centre of gravity shifts from “How fast will adoption rise?” to “How quickly will compute productivity per dollar and per kilowatt-hour improve and how will product design respond?”
Those are measurable questions. Price-per-token data, utilisation indicators, and the growing share of AI tasks running directly on devices rather than in data centres all provide signals that update continuously. Google’s move to run lightweight AI models directly on smartphones, and Apple’s decision to let developers use built-in AI models on devices, are examples of shifts that can redirect computing and electricity demand across the stack rather than simply increase it (Google, 2024; Apple, 2024).
Decision-makers can still act. Grid upgrades, siting strategies, power procurement, and demand-response arrangements can all be structured around ranges, triggers, and optionality. But the assumptions must be explicit. A credible forecast should state what it assumes about compute per unit of work, system efficiency, hardware improvements, and where AI tasks run.
If those assumptions are left implicit, the model may look clean, while reality becomes noisy. The world will almost certainly use more AI. Compute demand will almost certainly rise. Yet electricity demand might surge, or it might rise far less than many expect. Because grid infrastructure decisions must be made years in advance while AI electricity demand remains highly uncertain, planners should avoid overconfident forecasts and instead design flexible, adaptable investment strategies. AI demand must be managed with humility and optionality. Because the decisive factor is not how many people adopt AI, but how much computation is ultimately needed to deliver it.
Key References
Deloitte (2024): AI Predictions
Google (2023): Tensor Processing Unit performance and efficiency
Google Cloud (2024): Optimising AI workloads and accelerator utilisation
International Energy Agency (2025): Electricity 2025: Analysis
NVIDIA (2024): Accelerated computing power and efficiency
Stanford University, Human-Centered AI (2025): AI Index Report
Uptime Institute (2024): Global Data Center Survey
