News

AMD is ready to ship Halo

Announced at CES, the AI developer box for your desk is ready.

Jon Peddie

AMD’s Ryzen AI Halo mini PC, arriving in June 2026, gives AI developers a compact x86 workstation built around the Ryzen AI Max+ 395 (Strix Halo APU) with 128 GB of unified LPDDR5X memory, 40 RDNA 3.5 GPU compute units, and a full ROCm software stack preloaded for both Windows and Linux. It targets the same developer audience as Nvidia’s DGX Spark—at a price AMD expects to undercut by a meaningful margin.

AMD CEO Lisa Su introduced the Ryzen AI Halo during her CES 2026 keynote, positioning it as a reference AI developer platform rather than a consumer PC. The distinction matters. AMD does not typically sell branded systems; the Halo marks the company’s first foray into first-party hardware, following Nvidia’s move with the DGX Spark. AMD Senior VP Jack Huynh confirmed a June 2026 ship date at the company’s AI DevDay event in San Francisco, where he demonstrated the device running LM Studio, ComfyUI, and Visual Studio Code—the three applications that define most local LLM developer workflows today.

At the core sits the Ryzen AI Max+ 395, the flagship Strix Halo APU. This is a chiplet design: Two eight-core Zen 5 CPU dies connect to a large I/O die that carries an integrated RDNA 3.5 Radeon GPU and an XDNA 2 NPU. The whole package runs on a 256-bit LPDDR5X memory bus with 128 GB of unified capacity—the same pool shared by the CPU, GPU, and NPU simultaneously. A model that fills that memory does not need to page data between discrete VRAM and system RAM. For LLM inference, where memory bandwidth and total capacity determine how large a model you can run at what speed, that architecture matters more than peak FLOPS counts.

Strix Halo uses a die-to-die interconnect AMD calls a “sea of wires”—a direct fabric link between the Zen 5 CCDs and the GPU/IO die, running at matched clock speeds with 32 bytes per cycle in each direction. This eliminates the power and latency overhead of a traditional GMI PHY. The result: The CPU and GPU operate against the same physical memory pool at full bandwidth, without the host-to-device transfer overhead that makes discrete GPU setups inefficient for large model inference.

The RDNA 3.5 iGPU adds 32 MB of Infinity Cache (MALL) on top, delivering over 40% more effective bandwidth versus its own L2 at the cache level. For workloads that fit in the cache hierarchy, the GPU sees bandwidth comparable to midrange discrete parts. For workloads that spill into DRAM—which most LLMs do—the 256 GB/s unified bus does the work. The XDNA 2 NPU handles always-on inference tasks: keyword spotting, real-time transcription, and background processing that would otherwise keep the GPU awake and burning power.

“128 GB of unified memory with no host-to-device transfer overhead changes the local LLM calculus—the question is whether ROCm delivers on the software side.”

ROCm: The real test

AMD ships the Halo with ROCm 7.2.2 preloaded and promises Day-Zero support for leading open-weight models including GPT-OSS variants, FLUX.2, and SDXL. The company commits to Windows and Linux support simultaneously—a deliberate contrast with Nvidia’s DGX Spark, which launched Linux-only.

That software promise carries weight because RDNA 3.5 uses a different GPU architecture than AMD’s Instinct data center accelerators, which run the CDNA architecture that ROCm has historically optimized for. Making RDNA 3.5 a first-class ROCm citizen requires specific kernel and driver work. AMD has stated it will treat the Strix Halo APU as a primary target within the ROCm ecosystem going forward, which represents a meaningful commitment—one the developer community will test against Nvidia’s mature CUDA stack the moment devices ship.

The networking question remains open. Nvidia’s DGX Spark ships with dual 200 Gb/s ConnectX-7 QSFP ports—a $1,700 component—that lets two Spark units link natively for 405 billion-parameter model runs. AMD has the Pensando Pollara NIC in its portfolio but has not confirmed whether the Halo includes high-speed networking by default. That gap matters for developers that need multi-node scale-out, not just single-box inference.

The competitive position

The DGX Spark Founders Edition now carries a $4,699 price tag after Nvidia raised it 18% in February 2026, citing LPDDR5X memory supply constraints. AMD has not announced Halo pricing, but industry estimates place it between $2,000 and $3,000—a gap of $1,700 to $2,700 against the Spark. At the lower end of that range, the Halo reaches developers and research teams who find the Spark’s price prohibitive. It also runs standard x86 Windows applications natively, which the Arm-based GB10 cannot without emulation.

The Spark counters with FP4 precision—a Blackwell-exclusive capability that lets it run models up to 200 billion parameters at 4-bit quantization, and with the full CUDA ecosystem behind every kernel and framework. AMD’s RDNA 3.5 stops at FP16/INT8. For pure LLM inference at scale, that precision gap costs throughput. For developers prototyping, fine-tuning smaller models, or running image generation workloads, the gap narrows considerably.

What do we think?

The Ryzen AI Halo is the right product at the right moment—AMD’s hardware story is credible, the price position is strong, and Windows support out of the box removes a real barrier. The variable is ROCm. If AMD delivers Day-Zero model compatibility and stable RDNA 3.5 performance across PyTorch, TensorFlow, and ONNX Runtime, the Halo gives developers a genuine x86 alternative to the Spark. If ROCm support is uneven at launch, price alone will not close the software ecosystem gap. Watch the first independent benchmarks closely.

Inflection signal

The Ryzen AI Halo signals an inflection point where serious AI development hardware lives. For two years, the local LLM workstation was Apple Silicon or a discrete GPU rig. The entry of AMD and Nvidia with purpose-built developer boxes—both priced well below data center hardware, both running large models on a desk—marks the moment desktop AI compute becomes a product category in its own right. The next inflection point arrives when one of them ships a second-generation part with a meaningfully better memory bandwidth story.

Here’s how the AMD Halo compares to the Nvidia Spark.