Training the model behind ChatGPT’s 2023 leap forward cost OpenAI roughly $78m in computing time, by the estimates of Stanford’s AI Index and the research group Epoch AI. Answering the question you typed into it this morning cost a fraction of a cent. Between those two numbers lies the strangest cost structure in modern business, and most of what is odd about the AI industry — the gigantism, the secrecy, the handful of firms — follows from it.

Most businesses do not look like this. A baker who sells twice as many loaves buys twice as much flour; a haulier who carries twice the freight burns twice the diesel. Economists call these marginal costs — the cost of one more unit — and in most industries they dominate. Fixed costs, the ones you pay before selling anything, are real but bounded: the oven, the lorry. AI inverts the proportions. Nearly everything is paid up front, in the colossal one-off effort of training a model, and almost nothing is paid per use. Epoch AI estimates that the cost of training the largest models has grown two- to threefold each year for nearly a decade; Google’s Gemini Ultra ran to an estimated $191m in compute, and billion-dollar training runs are expected within a year or two, as of mid-2026. Serving an answer, meanwhile, costs pennies per thousand queries — not nothing, since unlike pure software every answer burns real electricity and chip-time, and across billions of queries the industry’s total serving bill has grown into one of its largest running costs, by some estimates now rivalling what it spends on training. The economics of this chapter survives that caveat, because what drives everything below is the cost per answer, and per answer the serving cost remains a rounding error beside the cost of training.

When fixed costs are huge and marginal costs are tiny, a simple piece of arithmetic takes over. The average cost of an answer — total spending divided by answers served — falls relentlessly as usage grows, because the same mountainous fixed cost is spread across ever more queries. A model that serves a billion users is, per answer, vastly cheaper than an identical model serving a million, even though the two cost the same to build. That gives the biggest producer a built-in price advantage no small rival can match, which is why industries with this shape — software, aircraft, telecoms networks — tend to end up with few producers. The economics of this chapter is the seed of the concentration described in the next.

Average cost per answer falls steeply as usage grows, because a huge fixed training cost is spread over more queries while the marginal cost of each answer stays near zero
Average cost per answer falls steeply as usage grows, because a huge fixed training cost is spread over more queries while the marginal cost of each answer stays near zero

The pattern is not new; what is new is the combination. Drug companies have long lived with it in one form: the first pill embodies a decade of research, the second costs pennies to press, so the industry organises itself around blockbusters, patents and scale. Utilities live with it in another: the grid or the waterworks soaks up capital for decades, and the marginal kilowatt is cheap, so the industry tends towards regulated monopoly. AI manages to be both at once. It has pharma’s research race — each new frontier model is a bet of hundreds of millions on a recipe that might not work — and the utility’s infrastructure burden, in the data centres and power plants on which the big technology firms spent around $400bn in 2025, with around $650bn pencilled in for 2026 on the firms’ own guidance, as of mid-2026. A business that combines the riskiest cost structure in industry with the heaviest is a business that very few balance sheets can enter.

Falling costs compound through learning as well as scale. In chipmaking the regularity is old enough to have a name, Wright’s law: each doubling of cumulative production cuts unit costs by a roughly constant fraction, as engineers wring waste out of the process. Something similar, only faster, is happening to the cost of running models. For a fixed level of capability, the price of AI computation — billed per “token”, roughly a word-fragment of text — has been falling around tenfold a year by the most conservative measures; chapter 2’s Epoch data put the median decline nearer fiftyfold. Performance that cost $30 to $60 per million tokens when GPT-4 launched in early 2023 could be had for well under a dollar three years later, as of mid-2026. Venture capitalists have taken to calling it “LLMflation”. The price of any given grade of machine intelligence does not drift down, as most prices do; it collapses.

That collapse has an awkward corollary for the people doing the spending: the asset they are building rots. A frontier model is typically the best in the world for a few months, until a rival — or its own maker — ships something better and its price falls towards the floor set by cheaper imitators. The model is a fast-depreciating asset sitting on slow-depreciating foundations: the chips beneath it are generally reckoned to be productive for three to six years, the buildings and power lines for decades. Whether the industry’s accounts recognise how quickly the expensive layer loses value is one of the quieter questions behind the bubble debate, and chapter 9 returns to it.

So, the puzzle. What kind of business spends billions to make something it then gives away almost free? One that has no choice. The cost structure rewards whoever spreads the largest fixed cost over the largest number of users, so each firm must keep paying the rising price of staying at the frontier — training costs growing two- to threefold a year — while the selling price of any fixed level of intelligence falls at least tenfold a year. It is a treadmill that runs faster the longer you stay on it, and only a handful of firms can keep their footing. Who they are, and what their fewness means for the rest of us, is the subject of the next chapter.

What to watch

  • Epoch AI’s tracker of frontier training costs: are the largest runs still growing two- to threefold a year, or has the curve bent?
  • The price of a fixed level of capability (for instance, GPT-4-grade performance per million tokens): is the tenfold-or-faster annual decline holding?
  • How long a frontier model stays the frontier — a lengthening reign would slow the depreciation treadmill.
  • Whether big technology firms lengthen or shorten the depreciation schedules for AI hardware in their accounts.

Dig deeper