Posted: Jon Peddie 07.09.18
I was reminiscing last week about the GPU and having a friendly debate with my pals at Nvidia about the origin of the acronym. They of course claim they invented it, the device and the acronym, and within certain qualifications they did.
The term was first used by Sony in 1994 with the launch of the PS1. That system had a 32-bit Sony GPU (designed by Toshiba). The acronym was used before and after that referring to a geometry processing unit—GPU. TriTech introduced the Geometry Processor Unit in 1996 and Microsoft licensed it from them in 1998. It was part of a multi-chip solution and used the OpenGL API.
3DLABS introduced the Glint multi-chip set in 1995 with a geometry processor unit (later integrated into one chip) which was targeted at the professional graphics workstation market, which at the time was the most demanding in terms of performance. Nvidia targeted their device at the gaming community which was smaller but growing rapidly. Five or six years later the gaming market took off, taking Nvidia with it, while the workstation market flattened out and didn’t provide enough sales for companies like 3DLABS to continue investing in new semiconductor designs. Soon Nvidia was able to adapt their device to the professional graphics market too, and increased the competitive pressure on dedicated graphics processor suppliers.
From 2000, the term GPU as applied to geometry processing unit has been frequently used and appears in dozens of patents.
During that same period of time, researchers at universities, always on the hunt for less expensive computing power and more of it began experimenting with using the processors in gaming consoles such as the Cell in the PS3, and the GPUs from ATI and Nvidia that were used in them
Ironically, the way they chose to program the GPU for computing applications was through OpenGL because it exposed more of the GPU’s capabilities.
Today, we find the GPU being used for artificial intelligence inferencing in mobile phones and automobiles, in AI training at various companies and government agencies, crypto-currency mining, scientific, medical, and engineering application acceleration, and robotics, to name a few of the most common application workloads. The GPU is reducing the time it takes researchers and engineers to analyze, design, and diagnose problems and challenges that, in the past, would have taken days to weeks, in some cases like protein-folding, maybe months. Not only are answers to complex and complicated problems being obtained sooner, they are being expanded in accuracy. One of the compromises made in data reductions is to reduce the accuracy to get an answer in one life-time.
But is it still a graphics processing engine? Clearly not. A case in point is the Nvidia Volta with its tensor engine and the Vega by AMD. Intel too will be entering the GPU market and bring its vast AI capabilities to one of their offerings.
It is an SoC, a parallel-processor with associative special function engines such as video codec, rasterizer, neural-net accelerator, and DSPs for audio and image-processing.
Today’s chips are massive devices. Nvidia’s Volta, for example, is the largest chip ever made and measures 815 mm². Crammed into that 12 nm die are 21.1 billion (with a B) transistors, with some of (lots of) them are being applied to the 5,376, 32-bit floating-point cores configured in a SIMD architecture, making it the biggest integrated parallel-processor ever built—and it’s likely to hold that title and claim for quite a while because as the feature size goes down, so has the yield making these giant chips harder to build and more expensive.
They are also prodigious consumers of data and demand the fast, tightly coupled memory with the highest bandwidth possible to feed all those 32-bit ALUs. To try and satisfy that demand, AMD and Nvidia have adopted 32 GBs of high-bandwidth 3D memory (HBM) stacks with 900 GB/s of peak memory bandwidth to their processors.
And it doesn’t stop there. If one of these monsters is good, shouldn’t 2, or 4, or 16 be even better? And the answer is yes, of course. The GPU is inherently capable of scaling but to do it you need a super-high-speed communications network, commonly referred to a fabric today. Intel has one, and so does Nvidia. AMD has one in their Epyc CPU for linking all those X86 Zen processors. Nvidia calls their chip-to-chip fabric NVlink and it moves data at up to 300 GB/s from one Volta GPU to another. AMD’s Infinity Fabric’s System Control Fabric (SCF) hits 41.4 GB/s within the chip.
All these techniques are modern day versions of the designs of parallel processors developed in the late 1980s and built in big racks. They are laughably slower than the SoCs in our smartphones today, but the communications schemes, and allocation of localized high-speed memory are the same—just a zillion times faster and larger in terms of ALUs. We owe all that Moore’s law and the amazing machines in those amazing fabs that make building 7 nm silicon systems possible.
And as interesting and mind-bending as all that is, it still leaves us with the need for a better name for these massive parallel processor SoCs. No doubt the clever marketing people at one supplier or another will coin a term, so I’m not going to offer one, but I can predict the word “accelerate” or “accelerator” will probably be in it, as will probably the term massive, or large (remember VLSI?). And there can be a lot of fun naming these monster chips.
But what about the folks who still want and need a good to great graphics accelerator? As exciting as AI and application acceleration is, the volume for these massive processors in those applications is a fraction of what is sold for gaming, photo and video editing, and professional graphics. For that very large population, we will still need and appreciate the tried and tested GPU.
So just as a cell splits in two in the process of life, maybe it’s time for the GPU to split in two and spawn its new, bigger, more powerful sibling—the parallel-processing application accelerator unit—PPAAU. Oh damn! I named it.