Nvidia doesn’t do small announcements. At CES 2026, Jensen Huang unveiled the Rubin platform - named after astronomer Vera Rubin - and it’s the most aggressive chip roadmap the company has ever put forward. Six new chips. A rack-scale AI supercomputer. And performance numbers that make Blackwell look like a warm-up.
With Nvidia reporting earnings on February 25 and the entire AI trade hanging in the balance, understanding Rubin isn’t optional anymore. It’s where the next $500 billion in AI infrastructure is heading.
What Is Rubin?
Rubin isn’t just a GPU. It’s a complete platform - CPU, GPU, networking, storage, and security - all co-designed from the ground up. Think of Blackwell as Nvidia owning the engine. Rubin is Nvidia owning the entire car.
Here are the six chips:
| Chip | What It Does |
|---|---|
| Rubin GPU | 50 petaflops of FP4 inference, 35 petaflops training. 288GB HBM4 memory. |
| Vera CPU | 88 custom Olympus cores, Armv9.2. Built for agentic AI workloads. |
| NVLink 6 Switch | 3.6 TB/s bandwidth per GPU. The glue holding the rack together. |
| ConnectX-9 SuperNIC | Integrated networking for rack-scale AI. |
| BlueField-4 DPU | Storage processor with built-in SSD for key-value cache. |
| Spectrum-6 Switch | 200G SerDes with co-packaged optics for AI-optimized fabrics. |
The flagship configuration is the Vera Rubin NVL72: a single rack containing 72 Rubin GPUs, 36 Vera CPUs, and all the networking and storage baked in. Total interconnect bandwidth: 260 TB/s.
For context, the entire internet backbone handles roughly 1,000 TB/s. A single Nvidia rack now does a quarter of that internally.
Rubin vs Blackwell: The Numbers
This is where it gets wild.
| Metric | Blackwell | Rubin | Improvement |
|---|---|---|---|
| FP4 inference | 10 PFLOPS | 50 PFLOPS | 5x |
| FP4 training | 10 PFLOPS | 35 PFLOPS | 3.5x |
| HBM bandwidth | 8 TB/s | 22 TB/s | 2.8x |
| NVLink bandwidth/GPU | 1.8 TB/s | 3.6 TB/s | 2x |
| Inference cost per token | Baseline | 10x lower | 10x |
| GPUs to train MoE models | Baseline | 4x fewer | 4x |
| Transistor count | Baseline | 1.6x more | 1.6x |
The headline number: 10x reduction in inference token cost. If you’re running a large language model in production today, Rubin means your inference bill drops by 90% at the same throughput. Or you get 10x the throughput at the same cost.
For training, the 4x GPU reduction for Mixture-of-Experts models is massive. A model that required 4,096 Blackwell GPUs could train on 1,024 Rubin GPUs. That’s not just cheaper - it’s fundamentally easier to coordinate.
Why This Architecture Matters
Previous Nvidia generations were primarily about a better GPU. Rubin is different - it’s about system-level co-design.
The Vera CPU
For the first time, Nvidia has a serious server CPU. The Vera has 88 custom “Olympus” cores (Armv9.2), optimized specifically for agentic AI workloads - things like multi-step reasoning, tool use, and chain-of-thought inference where the CPU orchestrates while the GPU computes.
Meta is already the first customer deploying Vera CPUs as standalone processors in their data centers, separate from GPUs. That’s a direct shot at Intel and AMD’s server CPU business.
The BlueField-4 DPU
Here’s a detail most coverage missed: the BlueField-4 has an integrated SSD for key-value cache. In LLM inference, the KV cache is often the bottleneck - it stores the attention context for every token in the sequence. By putting this directly on the DPU, Nvidia eliminates a major memory bandwidth chokepoint.
Cable-Free Rack Design
The NVL72 rack uses a modular, cable-free tray design that enables 18x faster assembly and servicing than Blackwell racks. When you’re deploying tens of thousands of racks across global data centers, this is an operational game-changer. Fewer cables = fewer failure points = higher uptime.
The $500 Billion Pipeline
During last quarter’s earnings, Nvidia disclosed $500 billion in combined Blackwell and Rubin visibility through the end of calendar 2026, with roughly $150 billion already shipped. The customers buying this hardware aren’t experimenting - they’re building permanent AI infrastructure.
The buyer list reads like a tech industry roster:
- Meta signed a multi-year deal for millions of Nvidia chips, including Vera CPUs and Rubin systems
- Microsoft, Google, Amazon are all building Nvidia-powered AI data centers at unprecedented scale
- Oracle, CoreWeave, Lambda are expanding GPU cloud capacity
The Mag 7 alone plans to spend roughly $700 billion on AI capex in 2026. A significant chunk of that flows directly through Nvidia.
The Roadmap: Rubin → Rubin Ultra → Feynman
Nvidia’s cadence is now annual, and the roadmap is public:
| Platform | Release | Performance |
|---|---|---|
| Blackwell | Shipping now | Baseline |
| Rubin | H2 2026 | 5x inference, 3.5x training |
| Rubin Ultra | H2 2027 | 100 PFLOPS FP4, up to 1TB HBM4e |
| Feynman | 2028 | 5-20x over Rubin (estimated) |
Rubin Ultra doubles Rubin’s compute to 100 petaflops and jumps to 1 TB of HBM4e memory per chip. For reference, that’s more memory than most servers have in total RAM today - on a single GPU.
Feynman, named after physicist Richard Feynman, is the 2028 architecture. Jensen has been teasing it as something that will “surprise the world.” Details are scarce, but it’s expected to push into HBM5 territory and target trillion-parameter models natively.
This annual cadence is the real moat. AMD and Intel are on 2-3 year cycles. Nvidia is iterating every 12 months, making it nearly impossible for competitors to close the gap.
What This Means for the Industry
For AI companies
Inference costs dropping 10x changes the economics of every AI product. Features that were too expensive to ship at scale - real-time video generation, persistent AI agents, multi-modal reasoning - become viable. Expect a wave of AI products in late 2026 / early 2027 that only exist because Rubin made them affordable.
For Nvidia’s competitors
AMD’s MI300X offers 192GB of HBM3 memory at competitive pricing, and Intel is targeting the budget segment with Gaudi chips. But Rubin’s full-stack approach - CPU, GPU, DPU, networking, all co-designed - is incredibly difficult to replicate. Having a competitive GPU isn’t enough when Nvidia controls the entire rack.
For investors
Nvidia reports Q4 FY2026 earnings on February 25. The Street expects $65.7B revenue and $1.53 EPS. But the real signal is forward guidance. If Nvidia guides above the expected $71B for next quarter, it confirms Rubin pre-orders are accelerating. If below, the entire AI infrastructure thesis gets questioned.
With 39 analysts at “Strong Buy” and price targets ranging from $100 to $352, Nvidia is the most consequential earnings report of the quarter. Maybe the year.
The Bottom Line
Nvidia isn’t just selling GPUs anymore. With Rubin, they’re selling complete AI data center infrastructure - compute, memory, networking, storage, and security - in a single rack. The 10x cost reduction in inference and the annual upgrade cadence create a flywheel that competitors can’t easily break.
The question isn’t whether Rubin is impressive. It is. The question is whether the $700 billion bet on AI infrastructure will generate returns. If it does, Nvidia becomes the most important company of the decade. If it doesn’t, well - that’s a $700 billion problem for everyone, not just Nvidia.
Either way, the Rubin platform is the most ambitious chip launch since the original CUDA ecosystem. And Jensen Huang is already teasing what comes next.
Comments