Nvidia made $60 billion in revenue in fiscal year 2024. A substantial portion of that came from a handful of companies - Google, Microsoft, Meta, Amazon - each spending billions on H100 and A100 GPUs.

Every single one of those companies is actively working to stop buying from Nvidia. Not because Nvidia’s products are bad - they are excellent - but because the strategic and economic logic of custom silicon is overwhelming at their scale.

The Cost Calculation

An H100 GPU costs approximately $25,000-$35,000. A large AI training cluster might use 10,000-50,000 of them. That is $250M to $1.75B just for the hardware, before power, cooling, networking, and operational costs.

At that scale, a chip that is 30% more efficient for your specific workload saves hundreds of millions of dollars. And custom silicon can easily be 2-5x more efficient for narrow tasks because it is designed without the generality requirements that make GPUs expensive.

GPU generality is a feature for the general market and a cost center for companies that only run one type of workload.

What Each Company Built

Company Chip Primary use
Google TPU (v5 current) Training + inference for all Google AI
Meta MTIA Inference for recommendation systems
Amazon Trainium / Inferentia Training + inference on AWS
Microsoft Maia 100 Azure AI training
Apple Neural Engine (in M-series) On-device inference
OpenAI In development Training + inference

Google’s TPU program started in 2015. By the time TPU v2 launched, Google had already trained BERT and early versions of what became LaMDA on custom hardware. The decade head start shows - Google’s training efficiency for equivalent model quality is ahead of companies running on commodity GPUs.

The Training vs Inference Distinction

This matters because the economics are different:

Training is a one-time (or periodic) massive compute job. You want maximum FLOPS per dollar, and the GPU architecture with its general matrix multiply capabilities is very good at this. Custom silicon helps here but the gains are more incremental.

Inference is what you pay for every day, forever, at scale. Every user query, every API call, every recommendation generates an inference. At a company serving billions of queries per day, inference efficiency directly translates to margin.

Custom inference chips can be dramatically more efficient because they are designed for one thing: running a specific size and type of model as fast as possible with as little power as possible. Meta’s MTIA is optimized for the recommendation models Meta runs, not for general neural network architectures.

Nvidia’s Response

Nvidia is not standing still. The H100 and H200 are the best general-purpose AI accelerators available and they will remain relevant for years. The upcoming Blackwell architecture is aimed squarely at the inference market that custom silicon is attacking.

Nvidia’s CUDA software ecosystem is also a genuine moat. Decades of libraries, tools, and developer familiarity make switching painful even when the alternative hardware is better on raw specs. Custom silicon without a mature software stack is often worse in practice than Nvidia hardware with excellent software support.

This is why several custom silicon programs have underdelivered. Amazon’s Trainium chips are technically impressive but the software toolchain is still inferior to CUDA. Getting engineers to optimize for non-CUDA hardware requires significant investment.

The Software Stack Problem

Hardware without software is useless. The AI training and inference software stack is dominated by PyTorch, which was built with GPU architecture assumptions baked in. Running PyTorch on custom silicon requires:

  • A hardware abstraction layer (XLA, ROCm, etc.)
  • Compilation from PyTorch operations to hardware-specific instructions
  • Debugging tools for a new instruction set
  • Profiling tools to understand hardware bottlenecks

Google’s JAX/XLA stack works well with TPUs because Google built both. For other companies, software support is often the bottleneck, not hardware performance.

Why Startups Cannot Do This

Custom silicon development costs $500M-$2B and takes 4-6 years from design to production deployment. The companies building custom chips are doing so at scales where the ROI calculation works out. Spending $1B on chip development makes sense if it saves you $5B over five years in Nvidia purchases.

For a company spending $50M a year on compute, the math never works. Custom silicon is exclusively a hyperscaler strategy.

The Geopolitical Dimension

There is a non-economic reason too. US export controls on advanced chips to China have demonstrated that access to cutting-edge silicon can be restricted for geopolitical reasons. Companies building critical AI infrastructure cannot afford to have a single supplier for the most important input to their business.

Diversifying chip supply through in-house development is partially an insurance policy against future supply chain constraints.

Bottom Line

Custom silicon investment by AI companies is rational, not hubristic. When your chip spend is in the billions, even a 20% efficiency improvement from purpose-built hardware pays for the design costs within a year or two. The inference economics are particularly compelling - the more successful an AI product, the stronger the incentive to optimize the per-query hardware cost.

Nvidia will remain dominant in the general market and for companies that cannot justify custom silicon investment. But the hyperscalers running AI at global scale will increasingly run their workloads on chips they designed themselves.