START TYPING KEYWORDS TO SEARCH OUR WEBSITE

Super Duper

Posted: 11.22.04

It's all about numbers—who's got the biggest, the fastest, the mostest. The problem is keeping it up. When you're talking about a lot of zeros, then it gets even harder and little blue pills don't help.

Take the number of zeros in a trillion. The word trillion denotes different numbers in American and British usage. In the American system, one trillion equals 1012. In the British, French, and German systems, one trillion equals 1018. The system used in the U.S. is not as logical as that used in other countries (like Great Britain, France, and Germany). In these other countries, a billion (bi meaning two) has twice as many zeros as a million, and a trillion (tri meaning three) has three times as many zeros as a million, etc. But the scientific community seems to use the American system, so if you're not a politician and interested in FLOPS, then a trillion FLOPS means you can do 1,000,000,000,000 Floating Point Operations Per Second—that's a lot.

Columbia
SGI's Columbia

But what if you could do even more? Up until a few weeks ago the most anyone could do was being done in a giant house in Yokohama, Japan, on a giant machine built by NEC for the Japanese government and appropriately called The Earth Machine. The Earth Machine held the world's record for FLOPS for a couple of years, maxing out finally at 35.86 TFLOPS.

Pause for a moment and try to consider that. Your super-duper 4.2-GHz P4 that's keeping the side of your leg warm, on a good day with favorable sunspots and a tuned compiler might reach 660 (240 sustained big matrix size) MFLOPS (http://www.tech-report.com/reviews/2001q3/pentium4-2ghz/index.x?pg=5). Now that's pretty damn impressive by itself, 'cause lord knows you need that kind of horsepower to run Word and IE (but not Mozilla). So if we said it was 359 MFLOPS then if you and all your friends in the world, all your family members, everyone you work with and everyone who went to any school with you, all had 4.2-GHz P4s, you still wouldn't match the horsepower of The Earth Machine.

The Earth Machine—that's sooo yesterday: 35.86 TFLOPS—poo. Real super-duper computers do 51.87 TFLOPS, like SGI's Columbia they built for NASA with 10,240 Intel Itanium 2 processors—now that's some TFLOPS. (There's a great story about this machine at http://www.sgi.com/features/2004/oct/columbia/columbia_pg2.html.)

 

Spiders
Spiders protecting the DoomGene machine

But if you want into the big league, then you have to go blue—BlueGene that is, son. Now we're talking: 70.72 TFLOPS as of last week—2X what that tired old Earth Machine can do. And here's how proud IBM is about it: you can't find a damn thing on their web page about it. If you really dig you can find a brief mention at http://www.research.ibm.com/bluegene/index.html. Now if I was running IBM (they ask me all the time, but I keep turning them down), I'd be taking out full-page color ads. IBM started this project in 1999 and funded it with $100 million—before any government contracts or grants. The full BlueGene/L machine is being built for the Lawrence Livermore National Laboratory in California and will have a peak speed of 360 teraflops by 2008 or sooner.

IBM's system uses more than 32,000 embedded processors designed for low power and fast, on-chip data movement, whereas SGI has built fast interconnections between more than 10,000 Intel Itanium processors.

Yes, folks, after two years of losing the technical lead in the supercomputing race, U.S. manufacturers reclaimed pre-eminence in the Super-Duper field last week, as systems designed by IBM and SGI for government contracts were named the world's fastest at the Pittsburgh Super Computer conference.

But many computer scientists are concerned that U.S. supercomputing is in danger of slipping behind again because the government isn't investing enough in the field. Well, we're busy, y'know, with other things right now.

And guess what? GPUs and VPUs, that's what. Yep, once again graphics will save the day. GPUs are clocking in at 76 BFLOPS. Take 10,000 of them, tightly couple them, and you have a machine potentially capable of 760 TFLOPS—2X BlueGene and at a fraction of the cost of BlueGene, Columbia, or The Earth Machine. And think of how fast "Doom3" would run on that puppy. We could call it "DoomGene."