Is a paradigm change needed to realize AGI?

Nvidia rewrote the AI processor map at GTC 2026 by folding Groq’s dataflow technology into Vera Rubin as a coprocessor—a move that solved a near-term bandwidth problem while revealing a longer-term one. The von Neumann architecture is approaching its limits, and two alternative processor families—photonic and neuromorphic—are advancing faster than the industry expected. Neither displaces Nvidia today, but both are converging on the moment when data center power constraints make them economically necessary.

Nvidia’s acquisition of Groq’s IP and its rapid integration into Vera Rubin at GTC 2026 solved one problem and exposed another. CEO Jensen Huang acknowledged it directly: “If you wanted to have services that deliver not 400 tokens per second, but a thousand tokens per second, NVLink72 runs out of steam, and you simply can’t get there. We just don’t have enough bandwidth. And so, this is where Groq comes in.”

Groq’s LPUs sacrifice memory capacity for speed—500 MB of SRAM per chip, but 150 TB/s bandwidth. A 256-LPU rack reaches 40 PB/s, enabling high-throughput inference on large-context windows. Paired with Vera Rubin’s 50 PFLOPS of NVFP4 compute per GPU and HBM4 memory, the combined system addresses the bandwidth wall—for now. That wall, like an aircraft sound barrier, does not disappear; it moves. And the industry will hit it again within a few years, possibly sooner.

Dataflow and compute-in-memory improvements will carry AI workloads further. But the von Neumann processor-memory architecture, even augmented, has a ceiling. Two alternative families are developing on the sidelines: photonic processors and neuromorphic processors. They diverge in physics, addressable markets, and readiness. Conflating them misframes the competitive landscape.

Photonic AI processors

The physics advantage is real and undisputed. Matrix-vector multiplication in the optical domain consumes near-zero switching energy because computation happens as light propagates through material—photons generate no resistive heat. Q.ANT claims 30× energy efficiency over conventional chips. Neurophos’ Tulkas T100 targets 300–350 TOPS/W. Lightmatter’s Envise claims significant performance-per-watt gains. These are pre-commercial figures, but they derive from legitimate physics, not marketing projection.

For matrix-heavy LLM prefill inference, a photonic coprocessor running at 30 W against a GPU running at 700–1,000 W represents a structural cost advantage that silicon node shrinks cannot eliminate. Nvidia Vera Rubin’s strength is not matrix-multiply efficiency—it is programmability, precision, memory bandwidth, and the fact that every AI framework on the planet runs on it. Switching to photonic coprocessing requires rewriting software stacks, validating numerical precision, and redesigning deployment pipelines. None of the photonic companies have solved this at scale, though Arago and Q.ANT both emphasize PyTorch compatibility to address the barrier.

Groq is the more direct architectural competitor to photonic inference than Vera Rubin is. Both target inference efficiency over training flexibility. Both sacrifice programmability for throughput. Groq ships commercially today, runs Llama-class models at 1,300+ tokens per second, and carries no optical-to-electronic conversion overhead. Its constraint is memory—SRAM-only limits model size without multi-chip scaling. The Tulkas T100’s 768 GB HBM gives it a large-model advantage over Groq, but the Tulkas T100 doesn’t ship until 2028. Q.ANT ships today but targets nonlinear AI and physics simulation, not LLM token generation.

The correct framing is not photonic versus GPU. It is photonic coprocessor alongside GPU—which is how Q.ANT and Neurophos both position their products. The inflection arrives when data center operators face a binary choice: build a new power substation to run more Nvidia racks, or deploy photonic coprocessors that deliver equivalent inference throughput at a fraction of the energy cost. Energy-constraint data suggests that window is opening faster than most analysts expected.

Neuromorphic processors

Neuromorphic is not one architecture—and that distinction matters more than it does for photonics, because photonic processors share the same basic physics advantage. Neuromorphic architectures diverge sharply across three types.

Spiking neural networks (SNNs) such as —Intel Loihi 2, BrainChip Akida, University of Manchester Tomorrow Labs’ SpiNNaker, and Innatera T1—compute only when a neuron fires, consuming near-zero energy between events. They excel on sparse, event-driven data: sensor fusion, audio, edge vision. They perform poorly on dense transformer inference.

Analog in-memory compute—IBM NorthPole, Mythic M1076—stores weights directly in memory cells and performs matrix multiply in place, eliminating the data-movement cost that dominates transformer inference energy. IBM NorthPole achieves 25× better energy efficiency than a comparable GPU on ResNet-50 and runs at 22 TOPS/W with no external memory. For edge-to-server CNN inference, that efficiency argument is demonstrated on silicon, not just claimed.

BrainChip Akida targets sub-milliwatt edge inference—wearables, sensors, IoT. For keyword spotting and gesture recognition at that power level, no GPU competes on energy per inference. That is not, however, the market Vera Rubin targets.

The fundamental problem for neuromorphic in the LLM market is architectural. Transformer attention is dense, non-sparse, and demands high-precision floating point across enormous weight matrices. Spiking neural networks were designed around sparse, temporal, event-driven signals—the structural opposite of what LLMs require. No neuromorphic chip today runs GPT-class or Llama-class models efficiently. Groq LPU running Llama 3 at 1,300+ tokens per second has no neuromorphic competition.

What separates them

Photonic processors are energy-efficient at the same workloads GPUs handle—dense matrix multiply for transformer inference. That makes them plausible coprocessors or replacements in the same data center rack. Neuromorphic processors are energy-efficient at fundamentally different workloads—sparse, event-driven, temporal data. Neuromorphic’s best case is edge AI, autonomous systems, and sensor processing. Photonics’ best case is data center inference coprocessing alongside conventional GPUs.

The only competitive overlap is in server-side CNN inference, where IBM NorthPole and photonic inference processors both compete—and both lose to Groq, specifically on LLM token generation. They belong in separate database categories because they serve separate markets, not because one is better than the other.

Neither displaces Nvidia before 2028 at the earliest. Both are advancing faster than the industry expected two years ago. The von Neumann architecture, extended through Groq’s dataflow integration, will serve well for several more years. By the time its ceiling becomes unavoidable, photonic and neuromorphic refinements will have progressed further—and the race to establish software ecosystems alongside novel silicon will determine who leads the next platform era.

What do we think?

Nvidia’s Groq integration is a tactical fix, not a structural solution. It buys three to five years before the memory wall forces a more fundamental shift. Photonic coprocessors address the same inference workloads GPUs handle, making them plausible rack-level additions, as power budgets hit physical limits. Neuromorphic addresses fundamentally different workloads—sparse, event-driven, edge AI—and competes in a separate arena. I view photonic as the more direct long-term challenger to GPU inference economics, but the software ecosystem gap remains the primary obstacle. Neither technology displaces Nvidia in the near term.

Nvidia’s GTC admission that NVLink72 “runs out of steam” at 1,000 tokens per second marks an inflection point—not in AI capability, but in AI processor architecture. For a decade, scaling meant more transistors, more memory bandwidth, and larger interconnect fabrics. That approach now has a visible ceiling. Photonic and neuromorphic processors are no longer academic research projects—they are architectural candidates for the era that follows. The race to define that next platform is underway, and the companies that pair novel silicon with production-ready software ecosystems will determine who leads it. There’s more information on the subject here.

_{LIKE WHAT YOU’RE READING? TELL YOUR FRIENDS; WE DO THIS EVERY DAY, ALL DAY.}

Is a paradigm change needed to realize AGI?

Related posts

Quarterly GPU shipments were down -10%, has seasonality returned?

Intel vs. Arm: The Microsoft factor

Jensen Huang to Receive Semiconductor Industry’s Top Honor

Recent products

Is a paradigm change needed to realize AGI?

Related posts

Quarterly GPU shipments were down -10%, has seasonality returned?

Intel vs. Arm: The Microsoft factor

Jensen Huang to Receive Semiconductor Industry’s Top Honor

Recent products

AI Processor Tracker Service

Photonic AI Processors Report

AI Processor Summary – an introductory report on AI processors