News

Blaize GSP streams AI graphs at the edge

A hardware scheduler replaces the memory bottleneck—16 TOPS at 7 W.

Jon Peddie

Blaize built its Graph Streaming Processor around a deceptively simple insight: Neural networks are graphs, so run them as graphs. The GSP’s on-chip hardware scheduler streams the computational graph depth-first through 16 cores, keeping intermediate data in cache rather than round-tripping to external DRAM. The result is 16 TOPS at 7 W on Samsung 14 nm—and a software stack that hides all of that complexity from the developer. The GSP2, targeting 1,600 TOPS on 5 nm for automotive ADAS, extends the same architecture to a far larger stage.

Blaize, founded in 2011 in El Dorado Hills, California—originally as ThinCI—spent nearly a decade developing a processor architecture it calls the Graph Streaming Processor, or GSP. The commercial product, code-named El Cano, arrived in 2020 on Samsung 14 nm. It was Blaize’s second silicon effort; a 28 nm test chip preceded it in 2017, providing the memory characterization data that shaped the production design.

The architecture

The GSP’s defining characteristic is graph-native execution. Conventional AI accelerators receive a neural network, decompose it into matrix operations, and execute those operations sequentially or in batches—a process that requires repeatedly staging intermediate results through external DRAM. The GSP treats the neural network as what it actually is: a directed graph. The hardware scheduler reads a metamap of the computational graph generated at compile time by the NetDeploy tool, then distributes subtasks across the 16 cores autonomously, depth-first.

Depth-first scheduling means that as soon as a layer produces intermediate results, the next layer begins consuming them —from cache, not from DRAM. Intermediate activations never leave the chip. The 4 Mb on-chip memory hierarchy, structured as a single unified address space, holds the working data for the active graph traversal. The cores themselves never manage scheduling; the scheduler holds that authority entirely. This inversion—schedulers in charge, cores as execution resources—is the central architectural decision that distinguishes the GSP from VLIW and GPU compute models.

Each of the 16 cores is an array of 4-bit load-store execution units that combine at runtime to handle operations from INT8 up to FP16. A 2D register file with a contiguous address space feeds data to execution pipelines. The hardware supports task-level, thread-level, data-level, and instruction-level parallelism simultaneously, allowing multiple independent graphs or graph segments to run concurrently across the chip.

Figure 1. The GSP processor block diagram. (Source: Blaize)

The architecture also deliberately conceals microarchitectural detail from software. The SIMD vector width, pipeline depth, and core granularity do not surface to the programmer. NetDeploy handles quantization, graph optimization, and partitioning—including cascading a single model across multiple chips or cards using seven available scheduling mechanisms. A model compiled for one GSP configuration requires no code changes to run on a scaled-out multi-card deployment.

The El Cano product

El Cano delivers 16 TOPS at INT8, 32 TOPS at INT4, and 2 TOPS at FP64, all within a 7 W TDP. Samsung 14 nm fabrication keeps die cost low—Blaize’s VP of Strategic Business Development Richard Terrill noted at launch that the chip has “not lots of on-chip memory,” deliberately keeping die area and cost accessible for industrial and aftermarket automotive deployments.

Three hardware products build on El Cano. The P1600 SoM (System on Module) integrates 16 GSP cores with dual Arm Cortex-A53 CPUs, H.264/H.265 video encode and decode, and camera sensor interfaces. The X1600E PCIe card targets edge server deployments at $299. The X1600P PCIe card targets higher-throughput installations at $999. All three run the Blaize AI Software Suite.

The software stack operates at two levels. Blaize AI Studio provides a no-code visual interface for building and deploying AI applications—targeted at system integrators and domain experts who are not ML engineers. Picasso SDK provides full OpenCL C++ access for developers who need direct hardware control. NetDeploy sits underneath both, handling model conversion from TensorFlow and other frameworks, quantization-aware optimization, and multi-device graph partitioning.

GSP2 and the automotive roadmap

El Cano targets industrial edge AI, smart cameras, and automotive prototyping in commercial grade. The automotive production part follows as the GSP2—Blaize’s next-generation SoC extending the graph-streaming architecture to Samsung 5 nm, targeting 1,600 TOPS for ADAS sensor fusion and perception. Blaize’s 2025 10-K filing places GSP2 production availability no earlier than 2028. DENSO and Mercedes-Benz have collectively paid $32 million in engineering fees against anticipated GSP2 delivery, signaling Tier 1 automotive intent.

Blaize went public in January 2025 via SPAC merger as Blaize Holdings on Nasdaq (BZAI), at a $1.2 billion valuation. The 2025 10-K flagged going-concern doubt, with a $103.8 million operating loss driven heavily by low-margin third-party GPU hardware resale. GSP2’s production timeline and the company’s ability to fund that roadmap represent the central financial risk for enterprise and automotive customers evaluating Blaize as a long-term silicon partner.

Target applications

El Cano’s industrial-grade process qualification and multi-sensor support make it practical for deployment environments that consumer-grade edge silicon cannot handle. Current application areas include surround-view automotive ADAS at Level 2+, retail analytics and loss prevention, industrial inspection on factory lines, and smart city camera systems. The GSP’s ability to run multiple independent neural networks simultaneously on one chip—processing camera, radar, and lidar streams concurrently—makes it particularly relevant to sensor fusion workloads where a single-model accelerator forces sequential processing.

Blaize’s graph-streaming architecture solves a real problem—off-chip memory bandwidth as the binding constraint in edge AI inference—with a hardware scheduler approach that keeps intermediate data on-chip and hides microarchitectural complexity from software developers. El Cano validated that approach in production silicon at 7 W. The gap between El Cano’s 16 TOPS and GSP2’s 1,600 TOPS target reflects the full ambition of the automotive ADAS opportunity Blaize is pursuing. Whether the company’s financial position supports the development cycle that gap requires is the open question for customers making long-cycle design commitments now.

What do we think? 

The GSP architecture is technically differentiated—graph-native scheduling with depth-first execution and unified on-chip memory genuinely reduces the DRAM bandwidth problem that constrains most edge AI accelerators. El Cano proved it works. The risk is entirely financial: a $103.8 million operating loss, a 2028 GSP2 timeline, and going-concern language in the 10-K create real supplier continuity risk for automotive programs with five-year design cycles.

The GSP’s graph-native execution model marks an inflection point in edge AI architecture thinking—the recognition that mapping neural networks onto matrix-multiply hardware wastes cycles on a representation mismatch, and that building the hardware around the graph eliminates that waste at the source. That inflection point is visible across the industry: dataflow and graph-native architectures from Blaize, Tenstorrent, and SambaNova all reflect the same conclusion. The question is no longer whether graph-native execution outperforms matrix-centric designs at the edge—it does. The question is which company survives to scale it.

The DENSO and Mercedes-Benz design win signals, and—critically—the financial risk assessment that any procurement team must factor into a long-cycle platform decision involving Blaize hardware.

Blaze’s GSP  is one of the 152 AI processors in our AI Processor Tracking Service, which also lists performance and other specifications for 291 products.

LIKE WHAT YOU SAW HERE? SHARE THE EXPERIENCE, TELL YOUR FRIENDS.