Microsoft’s Maia 200 inference AIP

Triples computational throughput, reduces inference latency and costs per token.

January 31, 2026

Jon Peddie

Microsoft introduced Maia 200, its second-generation AI inference AIP fabricated on TSMC’s 3 nm process with 140 billion transistors. The chip delivers 10 PFLOPS of FP4 performance within a 750 W envelope, integrating 216 GB of HBM3e memory and 272 MB of on-die SRAM. Two execution engines handle tensor and vector operations, while specialized DMA subsystems manage data flow. An on-die Ethernet NIC enables clusters of up to 6,144 accelerators. Microsoft deploys Maia 200 in its Central US data-center region, targeting OpenAI GPT-5.2 models, Microsoft 365 Copilot, and synthetic data-generation workloads across Azure infrastructure. (Source: Microsoft; with AI cat added)

...

Enjoy full access with a TechWatch subscription!

TechWatch is the front line of JPR information gathering service, comprising current stories of interest to the graphics industry spanning the core areas of graphics hardware and software, workstations, gaming, and design.

A subscription to TechWatch includes 4 hours of consulting time to be used over the course of the subscription.

Already a subscriber? Login below

Microsoft’s Maia 200 inference AIP

Enjoy full access with a TechWatch subscription!

This content is restricted

Login