Arm Ethos-N78 scales NPU IP to 10 TOPS

The Arm Ethos-N78 is the NPU IP core inside a significant fraction of the Android phones shipped since 2021. It’s not a chip—it’s licensable RTL that SoC vendors integrate alongside Cortex-A CPUs and Mali GPUs. The N78 scales from 1 to 10 TOPS across 90-plus configurations, cuts DRAM bandwidth by 40% versus its predecessor, and runs the full framework stack through the Vela compiler and Android NNAPI. MediaTek Dimensity 9000 and Samsung Exynos 2200 are among the highest-profile deployments.

Arm doesn’t sell chips. It licenses processor IP—RTL—runtime library. Semiconductor companies integrate Arm’s RTL (IP) into their own SoCs. The Ethos-N series is Arm’s NPU IP for Cortex-A-class applications: smartphones, tablets, set-top boxes, smart cameras, and DTV platforms. The N-series targets these higher-performance applications; the U-series handles microcontroller-class inference. Understanding that distinction matters for database purposes—the Ethos-N78 never ships with Arm’s name on the package. It ships inside MediaTek, Samsung, NXP, Rockchip, and dozens of other vendors’ SoCs.

Ethos-N77 to N78—the generational step

The Ethos-N77 was the first-generation Ethos-N NPU targeting the Cortex-A application space. Configurable from 64 to 256 MACs, it delivered 2 to 4 TOPS at INT8/INT16 precision. It found real deployment across Android SoCs and established the Ethos-N software stack, but it had a ceiling: 4 TOPS maximum, no multi-instance scaling, and limited coverage of depthwise convolutions and recurrent network operators. Arm deprecated N77 driver stack support in August 2021, confirming the N78 as the active generation. N77 status: Legacy.

^{Figure 1. Arm Ethos-N78 NPU. (Source: JPR)}

No official Arm block diagram is publicly released—they keep the microarchitecture detail under NDA for licensees. This is constructed from the Ethos-N78 product brief, Technical Reference Manual references, and developer documentation.

Key architectural elements shown:

The Ethos-N78, announced in 2021 as part of the second-generation Scylla architecture, doubled the peak MAC count to 2,048 and pushed the performance ceiling to 10 TOPS. More important than the raw TOPS increase were three architectural changes. First, many-core scaling: Licensees tile multiple N78 instances to reach higher aggregate throughput, which the N77 did not support. Second, 40% reduction in DRAM bandwidth per inference across popular networks—Inception_v3 shows 30% less DRAM traffic, YOLOv3_608 x 608 shows over 55% less. Third, 30% better area efficiency versus N77, meaning licensees get more TOPS per mm² of silicon without a proportional cost increase.

The N78 supports over 90 unique configurations. Licensees set the MAC count, on-chip SRAM size, and vector engine configuration independently, creating an NPU tuned for their specific power, performance, and area budget. Entry-level targets—smart cameras, DTVs, budget smartphones—use 1 to 2 TOPS configurations for super-resolution and upscaling. Mainstream smartphones and smart home hubs run classification, automatic speech recognition, and computational photography at 2 to 4 TOPS. Premium phones and laptops deploy 5 to 10 TOPS configurations covering the full stack: object detection, segmentation, ASR, super-resolution, and face detection simultaneously.

Arm’s internal benchmarks show better-than-linear scaling across configuration sizes in ResNet-50. A medium N78 design runs close to 50% faster than the small reference. The large design is approximately 3× faster, and the XL reaches roughly 6× the small design—confirming the tiling architecture delivers real throughput gains rather than linear area scaling.

In same-configuration comparisons to the N77, the N78 scores 20% higher on Inception_v3, 40% higher on Inception_v4, approximately 25% higher on VGG-16, and close to 30% higher on YOLOv3. These are not cherry-picked benchmarks—they’re the standard CNN inference suite, and the N78 improves consistently across all of them.

Software stack

The Ethos-N78 software strategy is write once, deploy everywhere. The Vela compiler handles offline compilation for embedded devices using TVM. Android NNAPI handles online interpreted compilation for mobile devices. Both paths are unified across Cortex-A CPU, Mali GPU, and Ethos-N NPU targets—the framework code runs unchanged regardless of which hardware the runtime selects. Supported frameworks include TensorFlow, TensorFlow Lite, PyTorch, and ONNX. The Ethos-N Static Performance Analyzer lets developers profile networks against the N78 before silicon is available, which matters for SoC programs where software development runs in parallel with hardware tape-out.

Arm Developer Tools bring ML event trace visualizations in Arm Mobile Studio, enabling performance profiling and debugging across the full Cortex-A/Mali/Ethos-N heterogeneous stack. For ISVs building Android AI applications, the toolchain path from framework training to NPU deployment is the same regardless of which N78-equipped SoC the application runs on.

Deployment

The N78 ships in MediaTek Dimensity 9000 series and Samsung Exynos 2200—two of the highest-volume Android application processors sold in 2022–2023. Combined with licensing to NXP, Rockchip, and others across smart camera and IoT SoC families, the N78 is one of the highest-volume NPU IP cores ever manufactured. Hundreds of millions of devices running N78 instances shipped before Arm introduced the next generation.

Arm announced the Ethos-N32/N68/N38 family in 2023, which now represents the current generation. The N78 remains in active production across many licensees’ existing SoC families. For ISVs and silicon teams evaluating NPU IP, the N78’s combination of broad deployment, mature toolchain, and validated performance across major Android platforms gives it a long tail—design wins in production today won’t transition to N32/N68/N38 on any short timeline.

Arm filed an IPO in September 2023 at a $54 billion valuation and trades publicly. FY2024 revenue was $3.23 billion, up 21% year over year, driven by royalties and licensing from exactly this kind of IP—the Ethos-N78’s royalty stream across MediaTek and Samsung volume alone represents a meaningful contributor to Arm’s ML Group revenue. Arm is fully viable. The Ethos-N NPU program is one of its fastest-growing licensing segments.

The Ethos-N78 isn’t a chip you buy—it’s IP that shipped inside the Android SoCs running in hundreds of millions of pockets. For silicon teams evaluating NPU IP for a new SoC design today, the N32/N68/N38 family is the current roadmap. For ISVs building against deployed Android hardware, the N78 represents a large and stable installed base with a mature toolchain and predictable performance across vendor implementations. Arm’s write-once, deploy-everywhere software strategy means code written for the N78 runs on the N38 without rewriting. That continuity is the real commercial value of licensing from the same IP vendor across generations.

What do we think?

Arm executes the IP licensing model better than anyone in this space, and the Ethos-N78 is the proof. The 90-plus configuration options give licensees genuine flexibility without fragmenting the software stack. The 40% DRAM bandwidth reduction addresses a real system cost driver. The MediaTek and Samsung deployment volumes validate the architecture at scale. The N32/N68/N38 transition is orderly and backward-compatible. For NPU IP selection, the Ethos-N family is the reference point everything else gets measured against.

The Ethos-N78’s deployment in Dimensity 9000 and Exynos 2200 marks an inflection point in consumer AI silicon: the moment a dedicated NPU became standard equipment in mid-to-high-end Android SoCs, rather than a premium differentiator. That inflection point reset developer expectations—applications now assume NPU acceleration on any device above entry level, and the market for AI-optimized apps shifted from niche to baseline. The Ethos-N78’s 90-configuration scalability model was the mechanism that made that inflection point economically viable across the full range of Android device tiers simultaneously.

Arm’s Ethos NPU is one of the 152 AI processors in our AI Processor Tracking Service, which also lists performance and other specifications for 291 products.

_{AND IF YOU LIKED WHAT YOU READ HERE, DON’T BE STINGY, SHARE IT WITH YOUR FRIENDS.}

Arm Ethos-N78 scales NPU IP to 10 TOPS

Related posts

Technicolor and its subsidiaries go dark

Nvidia Ada workstation board is memory-rich

MediaTek Dimensity 9300 chipset has all-big-core design

Recent products

Arm Ethos-N78 scales NPU IP to 10 TOPS

Related posts

Technicolor and its subsidiaries go dark

Nvidia Ada workstation board is memory-rich

MediaTek Dimensity 9300 chipset has all-big-core design

Recent products

AI Processors in Wearables Report

FPGAs in AI Report

Neuromorphic AI Processors Report