Graphics processors are a parallel universe

Posted: 04.19.04

Single-lane processingIt's spring and a new crop of graphics processors are being readied for market and I'm very excited about what's coming. I was going to say graphics processors are so cool—but then I would have had to add seventeenleven sentences explaining that I was using the vernacular convention of the word and not the thermal usage of the word, for heaven knows these puppies ain't cool thermally. But they are damn cool, and as I've been saying for a few years now, the CPU is the co-processor, and the GPU-VPU is the processor, period, end of discussion.

Parallel processingYou can see the outpacing of the CPU in so many ways. First of all, the GHz race is now officially over—Intel said so. Never mind that AMD, Transmeta, and VIA already abandoned that silly metric (it's like gauging a car's performance on how many RPMs the engine can hit); Intel said it now, so it's official. Intel said it for a couple of reasons: one is that they can't hit the high notes as easily anymore, and for another, the users are bored with it. It was meaningless when it was started and it has even less meaning to the user today. ("Ah, how many GHz do I need to run a 200 by 1280 spreadsheet?") As for the high notes, as you know, and I've commented on this in other parts of this week's issue, the shrinkage scaling process of CMOS transistors is about to come to an end and with it the increases in GHz.

What we're doing with transistors to get the game rolling is making them taller, and that will work for a while. What we're doing with processors to make them more powerful is run them in parallel, and Intel's Hyper-Threading was a half step in that direction. But parallel is the universe graphics lives in—it naturally scales in parallel, and you can see that with the first GPUs that bragged about multiple pipes, and with 3Dlabs's P10.

The difference between the lowly, but noble CPU and the mighty and regal GPU is that GPUs scale naturally with the apps, whereas CPUs require the apps to be rewritten and compiled. I can hear Craig Barrett calling Larry Ellison, "Hey, Lar, we got a new gizmo and we need you to recompile all your code for it. When can you have that ready?"

With its parallel architectures, the GPU—maybe I'll start referring to it as the MP (the Main Processor)—scales out and up, so to speak, giving it an exponential curve that exceeds the Moore's Law curve.

But wait—it gets even better. When Longhorn comes with WinFS all those full IEEE 32-bit floating-point processors in the GPU/VPUs—i.e., the "MPs"—can be put to other uses besides just crunching and enhancing pixels. One of Longhorn's big goals is a database engine that will find things for you like a really smart agent. You say, Where's that file that I sent to Jim, or was it Jerry, last month, or maybe last year, about the whatschamacallit? Now to do a search for that kind abstraction you need a lot of horsepower (not to mention one hell of a meta file on each file), and with 32 or 48 floating processors sitting on a PCI Express gateway you'll have processing capabilities only envisioned in science fiction stories.

There have already been some—national laboratories and universities have used the floating-point processors in GPUs as parallel processors. Those guys are going to be ecstatic when they see the new crop. And they are cheap, cheap, cheap! Four to six scalar units, four to six vector units, and 16 FPUs plus miscellaneous other little thingies to crunch special numbers, for what, maybe $300? And what does a Prescott sell for with its lowly single processor and make-believe HT stuff?

The GPU/VPUs are also much more intimate with their memory than a CPU, and they have as much as a CPU (if not more), and the GPU/VPU's memory runs faster. And what makes a computer fast and useful, class? That's right, a lot of fast memory.

So the core processor on a CPU may run like hell, 3.5 GHz, and a GPU core may only run at 550 MHz, but it's like a VLWI device in that it's doing 16 x 32 x 500 bits per second as compared to 32 x 3,500,000 a second, or the equivalent of 8 GHz. And next year when the CPU gets up to 4 GHz, the GPU will be the equivalent of 19 GHz (assuming a 600-MHz core and 32 pipes).

Of course, we'll have to use cryogenic cooling to get these things to work, but that just adds to the excitement. "What'sa matter, Ralph?" "My damn CO2 tank ran out and my machine is shutting down—why can't IT keep these things filled?"

I'm telling ya, these graphics thingies are cool, cool, cool!