Blog

Computing with GPUs and Cells

They’re not “Gs,” they’re “Ss” It’s pretty exciting to think of a supercomputer in a chip, but then I guess that all depends on what you’re calling a supercomputer, and I doubt we’d get much consensus. The Computer Desktop Encyclopedia (published by the Computer Language Company) says a supercomputer is: The fastest computer available. It is typically used for simulations ...

Robert Dow

They’re not “Gs,” they’re “Ss”

superIt’s
pretty exciting to think of a supercomputer in a chip, but then I guess that
all depends on what you’re calling a supercomputer, and I doubt we’d get much
consensus. The Computer Desktop Encyclopedia (published by the Computer
Language Company) says a supercomputer is:

The
fastest computer available. It is typically used for simulations in petroleum
exploration and production, structural analysis, computational fluid dynamics,
physics and chemistry, electronic design, nuclear energy research and
meteorology. It is also used for real-time animated graphics
.

And Wikipedia says:

A
supercomputer is a computer that leads the world in terms of processing
capacity, particularly speed of calculation, at the time of its introduction.
The term “Super Computing” was first used by
New
York World newspaper in 1920 to refer to large custom-built
tabulators IBM made for Columbia University
.

So, like art, what a
supercomputer is seems to be in the eye of the beholder. However, there are a
group of people who like to define supercomputers on floating-point operations
per second (FLOPS), and when they get a lot of them, a prefix is applied as in
GFLOPS. There is also a benchmark known as The Linpack Benchmark, which was
introduced by Jack Dongarra in 1978 or 1979, and it’s a collection of Fortran
subroutines for solving various systems of linear equations.

Supercomputers come in
various classes, too; there are shared memory systems, SIMD and MIMD systems,
distributed memory, ccNUMA, and cluster systems. And, as you might imagine, not
all of the top supercomputers are commercially available. The top five
supercomputers right now, according to the Top 500 website (http://www.top500.org/lists/2006/06),
are shown in the table on the preceding page.

The No. 1 machine is the
BlueGene/L System, a joint development of IBM and DOE’s National Nuclear
Security Administration (NNSA) and installed at DOE’s Lawrence Livermore
National Laboratory in Livermore, CA. BlueGene/L occupied the No. 1 position on
the last three Top 500 lists. It has reached a Linpack benchmark performance of
280.6 TFLOPS and is the only system ever to exceed the level of 100 TFLOPS.
This system is expected to remain No. 1 for the next few editions of the Top
500 list.

However, as mentioned, to
get a Linpack measurement you have to run Fortran code. Now GPUs by themselves
can’t run Fortran. But they can be hooked up with a GP processor that can.
However, the Cell with its Power CPU front end could run Fortran, and the PS3
has been announced as having a theoretical 2.18 TFLOPS while an Nvidia 7800 GTX
512 is capable of around 200 GFLOPS, and ATI’s X1900 architecture has a claimed
performance of 554 GFLOPSs. And given that the GF8800 is at least 2X the 7800,
we can assume it will come in around half a TFLOPS—since it can’t be measured,
it has to be calculated.

ATI with Peak Streaming,
and Nvidia with its CUDA, are going to apply GPUs into the scientific computing
space (see CUDA article, beginning on p. 1 of this issue). And there will be a
lot of marketing spin associated with it.

One of my peeves is the
nomenclature of GP-GPU—General-Purpose Computing on Graphics Processing Unit.
These are not general-purpose processors, these are SP-GPUs—Specific-Purpose
Computing on Graphics Processing Units—very specific. But that’s a marketing
battle I won’t win, so why bother? One reason to bother is because I find I
have to explain to the press, investors, and even industry companies that a GPU
will not run x86 code. And the same is true for the vaunted Cell (although it’s
been suggested that with recompiling it could).

Nonetheless, the
marketing spin is not the issue. What’s cool is that the 128 floating-point
processors in the G80, and probably a like number in the forthcoming R600, can,
and will, be used for things other than just polishing pixels. But, you have to
keep your perspective. From a hardware point of view they are very cost
effective per FLOPS, but they are not going to be found in all new
supercomputers, and as pointed out they are not supercomputers per se unto
themselves. They are co-processors, just like the Cray Computer SIMDs that are
hung on the side of the AMD processors in the Red Storm supercomputer at
Sandia.

But I like the bragging
rights. “Yeah, got a couple of G80s in this puppy—supercomputers on a chip,
y’know. Yep, yep, that’s what I use to play games, a gol’danged
supercomputer.”

AI—is it a lie?