What It Takes to Run an AI Model: The Full Hardware Stack and Who Benefits

When you send a prompt to ChatGPT, Claude, or Gemini, the response doesn’t come from software alone. It passes through at least 7 layers of physical hardware, each built by different companies, each taking a cut of the $700B+ being poured into AI infrastructure this year.

Most people know about Nvidia. But the AI hardware stack is much deeper than GPUs. Here’s every layer, what it does, and who profits.

Layer 1: The GPU (The Brain)

What it does: Processes the matrix multiplications that make neural networks work. Thousands of parallel cores crunch numbers simultaneously - this is where the actual “thinking” happens.

Why it matters: A single AI training run can use 10,000+ GPUs running for weeks. Inference (serving predictions to users) requires fewer GPUs but runs 24/7. The GPU is the most expensive single component in any AI data center.

Who benefits:

Company	Product	Market Share	Notes
Nvidia	H100, B200, Rubin	~90%	Dominant. CUDA ecosystem is the moat.
AMD	MI300X, MI400X	~5-10%	Gaining traction. 192GB HBM3 on MI300X.
Intel	Gaudi 3	<5%	Budget play. Competing on price, not performance.

Custom silicon is the wildcard:

Google designs TPUs (built by Broadcom) - used for Gemini training
Amazon has Trainium chips - used for internal AI and offered on AWS
Meta is developing custom AI silicon with Broadcom
OpenAI signed a $10B deal with Broadcom for custom chips

Broadcom is the hidden giant here - designing custom ASICs for Google, Meta, OpenAI, and Anthropic, with AI revenue projected at $46B in 2026.

Layer 2: Memory (The Short-Term Storage)

What it does: Feeds data to the GPU fast enough that it doesn’t sit idle. AI workloads are memory-bandwidth-bound - the GPU can compute faster than memory can deliver data. HBM (High Bandwidth Memory) solves this by stacking memory chips vertically and connecting them with thousands of tiny wires.

Why it matters: Memory is now the bottleneck in AI. A model like GPT-4 needs hundreds of gigabytes of memory just to hold its weights. The KV-cache (storing context during inference) grows with every token generated. Without enough fast memory, the most powerful GPU in the world sits idle.

Who benefits:

Company	Product	Market Share	Notes
SK Hynix	HBM3E, HBM4	~62%	Market leader. 2026 capacity already fully booked.
Samsung	HBM3E	~25%	Playing catch-up on yields and quality.
Micron	HBM3E	~13%	#3 player but gaining pricing power.

SK Hynix’s position is remarkable - big tech companies like Microsoft, Google, and Meta are reportedly “stationed in Korea” trying to secure additional HBM capacity. Their entire 2026 output is already sold out.

The numbers: Nvidia’s Rubin GPU has 288GB of HBM4 with 22 TB/s bandwidth. Rubin Ultra will have up to 1TB of HBM4e. Each chip needs multiple HBM stacks, and each stack costs hundreds of dollars. Memory is no longer cheap commodity hardware - it’s premium, supply-constrained, high-margin silicon.

Layer 3: Chip Manufacturing and Packaging

What it does: Actually builds the physical chips. Designing a chip is one thing. Manufacturing it at 3nm with billions of transistors and packaging it with HBM stacks is another entirely.

Why it matters: Nearly every AI chip - Nvidia, AMD, Google TPU, Amazon Trainium - is manufactured by one company. This is the ultimate chokepoint.

Who benefits:

Company	Role	Why It Matters
TSMC	Foundry (3nm, 5nm)	Manufactures chips for Nvidia, AMD, Broadcom, Apple, Qualcomm. The single most critical company in the AI supply chain.
TSMC (CoWoS)	Advanced packaging	CoWoS (Chip-on-Wafer-on-Substrate) packages GPU dies with HBM stacks. Capacity: ~110K wafers/month in 2026. Already sold out.
ASE / SPIL	Assembly and test	Backend packaging and testing for finished chips.

TSMC is irreplaceable. There is no alternative at the leading edge. Their CoWoS advanced packaging capacity is the hard constraint on how many AI GPUs can exist in the world. When Jensen Huang says Nvidia is “supply constrained,” he means TSMC’s CoWoS lines are maxed out.

Layer 4: Networking (The Nervous System)

What it does: Connects GPUs to each other. AI training distributes a model across thousands of GPUs that need to communicate constantly - sharing gradients, synchronizing parameters, exchanging activations. The network bandwidth between GPUs matters as much as the GPU compute itself.

Why it matters: In a 10,000-GPU training cluster, the GPUs spend 30-50% of their time waiting for data from other GPUs. Faster networking directly translates to faster training and lower cost.

Who benefits:

Company	Product	Role
Nvidia	NVLink 6, NVSwitch	GPU-to-GPU interconnect within a rack. 3.6 TB/s per GPU on Rubin.
Nvidia (Mellanox)	InfiniBand, ConnectX-9, Spectrum-X	Rack-to-rack and cluster-wide networking. Nvidia acquired Mellanox for $7B in 2020 - now it’s the backbone of every AI data center.
Broadcom	Tomahawk, Jericho switches	Ethernet switches for AI fabrics. The alternative to Nvidia’s InfiniBand.
Arista Networks	Cloud networking switches	High-performance Ethernet for hyperscaler data centers.
Amphenol, TE Connectivity	Cables and connectors	The physical copper and optical cables connecting everything. Boring but essential.

Nvidia’s acquisition of Mellanox might be the most underrated deal in tech history. By controlling both the GPU and the network, Nvidia owns the full data path. Competitors can match the GPU, but matching the integrated network is much harder.

Layer 5: Servers and Rack Assembly (The Skeleton)

What it does: Assembles GPUs, CPUs, memory, networking, and cooling into rack-ready servers that data centers can deploy.

Who benefits:

Company	Role	Notes
Dell	Server OEM	Major partner for Nvidia DGX and HGX systems.
HPE	Server OEM	Enterprise-focused AI server deployments.
Supermicro	Server OEM	Fast-to-market GPU servers. Popular with neoclouds.
Foxconn (Hon Hai)	ODM manufacturing	Builds servers for hyperscalers at massive scale.
Quanta Computer	ODM manufacturing	One of the largest server ODMs globally.

Layer 6: Power and Cooling (The Life Support)

What it does: Keeps everything running and prevents it from melting. A single Nvidia B200 GPU draws 1,000W. A rack of 72 Rubin GPUs needs enough power for a small neighborhood and enough cooling to prevent thermal shutdown.

Why it matters: AI data centers now consume 2-5x more power per rack than traditional cloud data centers. Cooling is shifting from air-based to liquid-based. Power availability is becoming the primary constraint on where new data centers can be built.

Who benefits:

Company	Product	Role
Vertiv	Power + cooling systems	The market leader. Sells everything from switchgear to coolant distribution units. Deep integration with Nvidia. Liquid cooling revenue doubled in Q1 2025.
Eaton	Power distribution	Electrical infrastructure for data centers.
Schneider Electric	Power + cooling	UPS systems, PDUs, and cooling for data centers.
CoolIT Systems	Direct liquid cooling	Liquid cooling solutions specifically for GPU racks.
Celestica	Power shelf assemblies	Custom power solutions for hyperscaler GPU racks.

Vertiv is the quiet winner here. Every Nvidia GPU rack needs their power and cooling infrastructure. As AI data centers scale from megawatts to gigawatts, Vertiv’s addressable market grows proportionally. Their cooling segment is projected to grow at 40% CAGR through 2028.

Layer 7: Data Center Facilities (The Building)

What it does: The physical building - real estate, power grid connection, fiber connectivity, security, and environmental controls.

Who benefits:

Company	Role	Notes
Equinix	Colocation provider	Largest data center REIT globally. Interconnection hub.
Digital Realty	Colocation provider	Major wholesale data center provider.
CoreWeave, Nebius, Lambda	GPU-native neoclouds	Build and operate their own AI-optimized facilities.
Hyperscalers (AWS, Azure, GCP)	Own data centers	Building at unprecedented scale - Microsoft alone is spending $80B+ on AI data centers in 2026.

The Full Picture

Here’s what happens when you send a prompt to an AI model:

Your prompt
  → Internet → Data center facility (Equinix/hyperscaler)     [Layer 7]
    → Power infrastructure (Vertiv/Eaton)                      [Layer 6]
      → Cooling systems (Vertiv/CoolIT)                        [Layer 6]
        → Network switches (Arista/Broadcom)                   [Layer 4]
          → Server (Dell/Supermicro)                           [Layer 5]
            → NVLink/InfiniBand fabric (Nvidia/Mellanox)       [Layer 4]
              → GPU (Nvidia Blackwell/Rubin)                   [Layer 1]
                → HBM memory (SK Hynix)                        [Layer 2]
                  → Chip manufactured at (TSMC 3nm + CoWoS)    [Layer 3]
                    → Model processes your prompt
                  → Response travels back up the stack
→ Your screen

Every layer takes a margin. Every layer has companies generating billions in revenue. The total AI infrastructure market is projected to hit $1 trillion by 2030.

The Investment Thesis by Layer

If you believe AI demand keeps growing:

Layer	Highest conviction play	Why
GPU	Nvidia	90% share, full-stack integration, annual cadence
Custom silicon	Broadcom	60% custom ASIC share by 2027, $46B AI revenue
Memory	SK Hynix	62% HBM share, sold out through 2026
Manufacturing	TSMC	Irreplaceable. Every AI chip goes through them.
Networking	Arista Networks	Benefits from all AI data center buildouts
Power/Cooling	Vertiv	40% CAGR in cooling, deep Nvidia integration
Neocloud	CoreWeave / Nebius	Triple-digit growth, massive backlogs

The risk across all layers: If the $700B in AI capex doesn’t generate returns, every company on this list gets hit. The entire AI hardware stack is a correlated bet on AI demand. There’s no hedge within this ecosystem - if AI spending slows, it slows for everyone from TSMC to Vertiv.

Bottom Line

When people say “invest in AI,” most think Nvidia. But the AI hardware stack has 7 layers, each with companies generating billions. The smartest money isn’t concentrated in one layer - it’s spread across the supply chain.

SK Hynix is sold out through 2026. TSMC’s packaging lines are maxed. Vertiv’s cooling revenue is doubling. These aren’t speculative bets on AI’s future - they’re companies selling into confirmed, paid-for demand today.

The AI gold rush is real. And the companies selling picks, shovels, water, and maps are all making money.

Layer 1: The GPU (The Brain)#

Layer 2: Memory (The Short-Term Storage)#

Layer 3: Chip Manufacturing and Packaging#

Layer 4: Networking (The Nervous System)#

Layer 5: Servers and Rack Assembly (The Skeleton)#

Layer 6: Power and Cooling (The Life Support)#

Layer 7: Data Center Facilities (The Building)#

The Full Picture#

The Investment Thesis by Layer#

Bottom Line#

Comments

Layer 1: The GPU (The Brain)

Layer 2: Memory (The Short-Term Storage)

Layer 3: Chip Manufacturing and Packaging

Layer 4: Networking (The Nervous System)

Layer 5: Servers and Rack Assembly (The Skeleton)

Layer 6: Power and Cooling (The Life Support)

Layer 7: Data Center Facilities (The Building)

The Full Picture

The Investment Thesis by Layer

Bottom Line