Qualcomm’s powerful new HPU the S4

Posted: 10.07.11

Initially leveraging a 28nm process from TSMC, Qualcomm has announced its Snapdragon S4 class of processors, of which the first member is the MSM8960 with an Adreno 225 GPU. The new 1.5 GHz processor (S4 will scale up to 2.5GHz) has Qualcomm’s micro-architectural design with four independent, proprietary ARM Cortex A15-class CPU cores, plus a 32-core GPU, plus 128 bit SIMD engine, plus three DSPs, plus a handful of hardwired engines for codecs and other special-purpose functions—basically a five processor+ heterogeneous processor that has an open programming environment, and a fast memory interface and manager.
Qualcomm's S4 five processor HPU (Source: Qualcomm)
The new CPU core is compatible with the ARM instruction-set architecture (ISA). It is the rumored Krait S4, with a three-level cache, the highest level and largest level gets shared with the GPU and SIMD engine. And, with all the performance this new chip has (50% or more than the "Scorpion" core in the MSM8660), it actually uses less power—25-40% less power.

Qualcomm gets these results because Krait is a full custom design (not an ASIC flow Cortex A15). Qualcomm did the SIMD too, based on the Neon concept; Qualcomm calls their 128-bit data path SIMD VeNum.Low Power runs better at room temperature than LPG does (Source: Qualcomm) This is the first fully integrated SoC with an LTE/3G modem as well as TD-SCDMA, and GSM, EDGE, PS, WiFi, and others. Power management is extreme in this device, every circuit; in some cases down to a flip-flop can have its power cut. The RISC cores and their L2 cache are asymmetric and can have the voltage and/or frequency varied on a per core basis. This gives Qualcomm the equivalency of turbo-mode as well as a better solution than the big-little approach (an alternative design approach used by some suppliers, which augments a powerful processor such as a Cortex 1-15 with a small utility processor such as an A-9). Part of the strategy behind the design was to use TSMCs Poly/SiON and just LP (low power) transistors instead of LPG transistors, which are similar but tend to leak at higher temperatures. By not going to LPG transistors the company neatly side steps the leakage current and power draining issues and still hits the performance goals with ample head room.

Better GPU too

The S4 processor family incorporates Qualcomm’s Adreno GPU technology, starting with the Adreno 225 Graphics Processing Unit (GPU).
Qualcomm’s Adreno GPU family (Source: Qualcomm)
Qualcomm claims the new chip represents a 50% increase in GPU performance over the previous generation GPU, the Adreno 220, and 6 times the processing power of Adreno 200.
the Adreno 225 is a programmable GPU with a Unified Shader Architecture (USA). Qualcomm says The Adreno 225 GPU has twice the memory bandwidth of its predecessor GPU, which further contributes to better graphics performance at higher display resolutions. The APIs supported by Adreno 225 include OpenGL ES 1.1, OpenGL ES 2.0 and Windows 8 DX9.3.
These new features include:
• Increased unified shader flexibility and capability
• Improved texture engines with support for sRGB textures
• Enhanced rasterization hardware with support for multiple render targets, user clip planes, instancing and other advanced features, and improved blt and interrupt performance.
Adreno GPUs also utilize a unique binning-based approach to rendering, which contributes to lower memory bandwidth consumption and maximum concurrency capability.

And a world modem

Qualcomm is claiming to have the first fully integrated world mode/multimode LTE/3G in the Snapdragon S4 processor. The modem is actually Qualcomm’s second generation LTE/3G multimode modem and its MSM8960 chipset implementation will include the latest LTE release 9 features, such as SI tunneling for enhanced CSFB performance, eMBMS, enhanced position location for E911, as well as several IMS based features such as VoLTE, SR-VCC, RCS and video telephony.

Look for DSPs in there too

In addition to designing custom CPUs, GPUs and modems, Qualcomm also designs its own custom DSP, which they brand as Hexagon DSPs. These processors have been an integral part of Snapdragon processors since 2006, Qualcomm just hasn’t spoken about them publicly very much, if at all.
These are serious processors which in addition to having a memory management unit, symmetric multiprocessing support and a hypervisor for increased capability, the Hexagon DSP’s used in Snapdragon S4 processors have dedicated L1 instruction and data caches, a dedicated L2 cache, and are designed using an interleaved multi-threading (IMT) architecture, meaning each thread is resourced with independent program counters and registers. The DSP is capable of running multiple applications concurrently much like a CPU and because it’s designed for ultra-low power it is well positioned for offloading specific tasks like audio, sensors, video, and imaging enhancement.
Qualcomm DSPQualcomm’s Hexagon DSP roadmap (Source: Qualcomm)
Hexagon DSPs play a substantial role in the area of multimedia as most multimedia functions can be more efficiently processed using a DSP. Once a function has been offloaded to a DSP on the Snapdragon S4 processor, they are unaffected by user application loads on the CPUs.