top of page
Media (14)_edited.jpg

THE CONTROL ROOM

Where strategic experience meets the future of innovation.

The Hidden Stack: The Real Truth About AI Infrastructure Spending (And Why It’s Not on the Charts)

  • Writer: Tony Grayson
    Tony Grayson
  • Dec 15, 2025
  • 9 min read

Updated: Jan 5

By Tony Grayson, President & GM of Northstar Enterprise + Defense | Built & Exited Top 10 Modular Data Center Company | Top 10 Data Center Influencer | Former SVP Oracle, AWS & Meta | U.S. Navy Nuclear Submarine Commander | Stockdale Award Recipient


Published: December 15, 2025 | Updated: January 5, 2025 | Verified: January 5, 2025


TL;DR: 

"The industry is funding an Apollo Program every 10 months— $400 billion in AI infrastructure spending annually. But 70% of production teams now use open source, shifting spend from SaaS to the "Hidden Stack" of compute and power. If you want to see where the market is going, don't follow the invoices...follow the megawatts." — Tony Grayson, President & GM Northstar Enterprise + Defense


In 30 seconds: 

When companies switch from GPT-4 to Llama, money moves from "Software" to "Utility" budgets. This is the Hidden Stack—it appears as $0 on SaaS charts, even as billions flow to compute infrastructure. Inference costs now exceed training. The chain of custody: Tokens → GPU-seconds → kWh → MW at the meter.


COMMANDER'S INTENT: THE MAP VS. THE TERRITORY

The Mission: Reveal where the $400B+ in AI infrastructure spending is actually going—beyond the SaaS charts.

The Reality: 70% of production AI teams use open source models. When they switch from APIs to self-hosted inference, spending shifts from software licenses to GPU hours and data center power. This "Hidden Stack" doesn't appear on marketing charts.

The Tactical Takeaway: If you want to see where the market is actually going, don't look at SaaS receipts. Look at reserved GPU-hours. Look at the committed colo kW. Follow the megawatts.


The industry is funding an Apollo Program every 10 months. But if you look past the credit card receipts and follow the megawatts, you'll see the real story of AI infrastructure spending.


Split screen comparison: Left side shows an a16z bar chart labeled 'The Map' representing SaaS spend. Right side shows a data center server aisle labeled 'The Territory' representing physical AI infrastructure spending.
The Map vs. The Territory: Marketing charts track software licenses (The Map), but the physical reality of the data center (The Territory) indicates that AI infrastructure spending is shifting massively toward the 'Hidden Stack' of open-source and colocation.

How Much Is the AI Industry Actually Spending?


The scale of investment happening in AI right now is staggering.


Tech companies are projected to reach roughly $400 billion in AI infrastructure spending globally this year (2025). To put that in perspective: adjusted for inflation, that is more money spent in a single year than the entire Apollo Moon Program cost over a decade (~$257 billion in today's terms).


And the trajectory is accelerating. Combined AI capital expenditures in the U.S. alone are on track to exceed $500 billion in 2026.


In essence, the industry is now funding an "Apollo Program" sized infrastructure investment every 10 months.


But here is the question no one is asking: If we are spending $400 billion a year, what are we actually building?


Why Don't Spending Charts Show the Full Picture?


In military planning, we employ a mental model known as "The Map is Not the Territory." A map is a reduction of reality...useful for orientation, but dangerous if confused for the truth.


I was reminded of this when I saw a widely circulated chart this week titled "Where Startups Spend on AI." It lists the usual suspects—OpenAI, Anthropic, and a handful of SaaS tools, while Open Source models like Llama appear to have a market share of roughly 11%.


This chart is the Map. It accurately tracks who gets the monthly credit card payments (SaaS revenue). But it completely misses the Territory.


Marketing charts track software licenses (The Map), but the physical reality of the data center (The Territory) proves that the industry is shifting massively toward Open Source.


According to a November 2025 analysis by Tomasz Tunguz, 70% of production AI teams now use open-source models in some capacity.

  • 48% describe their strategy as "mostly open source."

  • 22% are "only open source."

  • Only 11% stay purely proprietary—ironically matching the exact slice of the market that the spending charts claim is dominant.


What Is the Hidden Stack in AI Infrastructure?


If 70% of the market is using Open Source, why does it appear as $0 on the "AI Spend" chart?


Because Open Source isn't a Software cost, it is an Infrastructure cost.


As Tony Grayson explains, when a startup switches from GPT-4 to Llama 3, they stop writing a check to OpenAI (Software License) and start writing a check to AWS, CoreWeave, or a Colocation provider (Compute & Power). The money didn't vanish; it just moved from the "Software" budget to the "Utility" budget.


This is the Hidden Stack. It doesn't appear in SaaS metrics, but it is the primary destination for the $400 billion in AI infrastructure spending.


The Real Driver of AI Infrastructure Spending


To understand why this "Hidden Stack" is growing so fast, we have to distinguish between two parallel shifts that are compounding each other.


Shift 1: The Move to Open Source (The Margin Shift).

This explains where the money is going. When you run proprietary models, the model provider (OpenAI) captures the margin. When you run Open Source on bare metal, the Infrastructure Provider captures the margin.


Shift 2: The Move to Inference (The Volume Multiplier).

This explains why the bills are becoming so large. For the past two years, we have focused on Training (brain development). We are now entering the era of Inference (using the brain).


  • Training is episodic. You train a model once, like building a factory. You then tune and update as needed.

  • Inference is continuous. You run the model 24/7/365, like running the assembly line.


As companies move from "experimentation" to "production," the economic pressure to optimize these continuous costs becomes overwhelming. This forces them to move from expensive API calls (SaaS) to efficient, self-hosted open source (Infrastructure).


  • The Forecast: Gartner projects that spending on inference will overtake training workloads in AI-optimized IaaS in 2026, rising to over 65% of all AI compute spending by 2029.

  • The Scale: To give you an idea of the volume, analysis suggests OpenAI spent roughly $8.67 billion on inference alone in just the first nine months of 2025.


The Hybrid Reality


To be clear, this isn't a binary switch. Most sophisticated teams are running a hybrid architecture. They use GPT-4 or Claude 3.5 for complex reasoning (low volume), and fine-tuned Llama models for high-volume, repetitive tasks.


But in the last 12 months, the economics have forced a hard shift toward the latter.

  • The Catalyst: We have reached near "Model Parity," where open models such as Llama 3.1 are sufficiently accurate for many production tasks.

  • The Result: A Series B fintech company we track recently reduced its inference costs by 28% in one quarter by moving its high-volume document processing from a closed API to self-hosted Llama on bare metal.


When you are a seed-stage startup, you pay for convenience (e.g., APIs). When you scale, you pay for control (Infrastructure).


Conclusion: Follow the Megawatts


"The market is voting with its code, not just its credit cards. Tokens → GPU-seconds → kWh → MW at the meter."


— Tony Grayson, former SVP Oracle, AWS, Meta


It is easy to get distracted by the Map. SaaS funds must demonstrate SaaS revenue to justify their valuations. However, as operators, we have to consider the Territory.


If you want to see where the market is actually going, don't look at SaaS receipts. Look at reserved GPU-hours. Look at the committed colo kW. Look at transformer lead times.


The winners of this shift aren't just the software providers. It is the entire physical stack: the power-secured sites, the switchgear supply chains, the liquid-cooling technologies, and the efficient inference silicon.


The market is voting with its code, not just its credit cards. And the chain of custody is clear: Tokens → GPU-seconds → kWh → MW at the meter.


If you want adoption, don't follow the invoices. Follow the megawatts.


A great breakdown on why the "price per token" paradox is driving enterprises toward open models, even when closed models seem dominant.

KEY TAKEAWAY:

The market is voting with its code, not just its credit cards. The chain of custody is clear: Tokens → GPU-seconds → kWh → MW at the meter. The winners of this shift aren't just software providers—it's the entire physical stack: power-secured sites, switchgear supply chains, liquid-cooling technologies, and efficient inference silicon. If you want adoption, follow the megawatts.



Frequently Asked Questions: AI Infrastructure Spending & The Hidden Stack


What is the 'Hidden Stack' in AI infrastructure?

The Hidden Stack refers to AI infrastructure costs (compute, power, colocation, GPU hours) that don't appear on SaaS spending charts but represent the majority of AI infrastructure spending. When companies switch from proprietary APIs like GPT-4 to open source models like Llama, spending shifts from software licenses to utility budgets—GPU hours, reserved capacity, and data center power. This is where the real $400+ billion is going. IDC projects AI infrastructure spending will reach $758 billion by 2029.


Why doesn't open source AI show up on spending charts?

Open source AI isn't a software cost—it's an infrastructure cost. When enterprises use Llama instead of paying OpenAI, they stop writing checks for API calls and start paying AWS, CoreWeave, or colocation providers for compute and power. Marketing charts track SaaS revenue (the Map), but miss the physical reality of data center spending (the Territory). Over 70% of production AI teams now use open source models, but it appears as $0 on software spending charts while billions flow to GPU compute and data center infrastructure.


How much does AI infrastructure cost in 2025?

Global AI infrastructure spending exceeded $400 billion in 2025 and is projected to surpass $500 billion by 2026, according to IEEE ComSoc. IDC forecasts AI infrastructure spending will reach $758 billion by 2029. Major hyperscalers are investing heavily: Meta plans $72 billion on AI infrastructure in 2025, Microsoft $80 billion, Amazon $100 billion, and Alphabet $75 billion. McKinsey estimates total data center capital expenditure could reach $5.2–7.9 trillion by 2030.


What is the difference between AI training and inference costs?

Training is episodic (building the model once, like constructing a factory), while inference is continuous (using the model 24/7/365, like running an assembly line). Training is a one-time CapEx cost, but inference accounts for 80–90% of total AI lifetime costs. Gartner projects inference spending will overtake training, rising to over 65% of all AI compute spending by 2029. This volume multiplier forces companies to move from expensive APIs to efficient self-hosted infrastructure.


How much power do AI data centers consume?

AI data centers typically require 200+ megawatts compared to 30–50 megawatts for traditional data centers—a 4–7x increase. The IEA projects global data center electricity consumption will double to 945 TWh by 2030. BloombergNEF forecasts US data center power demand alone will more than double to 78 gigawatts by 2035. A single modern AI GPU consumes up to 3.7 MWh per year. OpenAI's Stargate initiative anticipates multi-gigawatt data centers.


What is 'Model Parity' and why does it matter for AI infrastructure?

Model Parity means open source models like Llama are now "good enough" for most production tasks that previously required proprietary models. According to Stanford's AI Index Report, open-weight models have closed the performance gap to just 1.7% on some benchmarks. This is the catalyst driving the shift to self-hosted infrastructure. Companies are cutting inference costs by 28–50% by moving from closed APIs to self-hosted open source on bare metal.


How much can companies save by self-hosting AI models?

Enterprises using self-hosted open source models can reduce token costs by over 90% compared to public APIs. ScaleOps reports early adopters achieving 50–70% GPU cost reductions. Companies have documented 30% cost reductions and more than 4,000 employee hours saved annually with self-hosted generative AI. The economics shift dramatically at scale: API costs scale linearly with usage, while self-hosted infrastructure provides unlimited inference for fixed fees.


Why are AI inference costs rising for enterprises?

Inference costs are rising due to wider AI adoption, demand for real-time performance, increasing model complexity, and growing data volumes. Gartner warns companies scaling AI can face cost estimation errors of 500–1,000%. IBM reports average computing costs climbed 89% between 2023 and 2025. Every prompt generates tokens that incur costs, and as usage increases, token generation and associated computational costs scale proportionally.


What percentage of companies use open source AI models?

Over 70% of production AI teams now use open source models according to industry surveys. Meta's Llama models have been downloaded over 400 million times, at a rate 10x higher than the previous year. McKinsey found 72% of organizations have adopted open-source AI systems. In Latin America and other regions, open-source models are adopted at even higher rates than proprietary alternatives in production scenarios.


How much does it cost to train an AI model?

Frontier model training costs grow at 2.4x per year. GPT-4's estimated development cost was $78 million+, Gemini Ultra around $191 million. However, efficiency breakthroughs like DeepSeek-V3 demonstrated training costs can be reduced dramatically—costing under $6 million. Training costs are split as follows: 47–67% hardware, 29–49% R&D staff, and 2–6% energy. By 2027, the largest training runs will exceed $1 billion.


What is 'Follow the Megawatts' in AI infrastructure analysis?

Follow the Megawatts is an analytical approach that tracks AI spending through power consumption rather than software invoices. Since open source AI appears as $0 on SaaS charts but requires massive compute infrastructure, tracking megawatt demand reveals true AI spending patterns. Data centers expanding from tens of megawatts to hundreds show where capital is actually flowing in the AI economy.


How much do GPUs cost for AI workloads?

High-performance AI GPUs like NVIDIA H100 cost $25,000–$40,000+ each. Cloud GPU instances range from $2–$15 per hour, depending on the card and provider. A single NVIDIA A100-80G GPU costs around $2/hour minimum, meaning training a large model can cost $4 million+ in hardware usage alone. Modern AI GPUs can consume up to 700 watts each, with next-generation models requiring even more power.



____________________________________


Tony Grayson is a recognized Top 10 Data Center Influencer, a successful entrepreneur, and the President & General Manager of Northstar Enterprise + Defense.


A former U.S. Navy Submarine Commander and recipient of the prestigious VADM Stockdale Award, Tony is a leading authority on the convergence of nuclear energy, AI infrastructure, and national defense. His career is defined by building at scale: he led global infrastructure strategy as a Senior Vice President for AWSMeta, and Oracle before founding and selling a top-10 modular data center company.


Today, he leads strategy and execution for critical defense programs and AI infrastructure, building AI factories and cloud regions that withstand real-world conditions.

Comments


bottom of page