It’s all about numbers—who’s got the biggest, the
fastest, the mostest. The problem is keeping it up. When you’re talking
about a lot of zeros, then it gets even harder and little blue pills
don’t help.
Take the number of zeros in a trillion. The word trillion
denotes different numbers in American and British usage. In the American
system, one trillion equals 1012. In the British, French, and German
systems, one trillion equals 1018. The system used in the U.S. is not
as logical as that used in other countries (like Great Britain, France,
and Germany). In these other countries, a billion (bi meaning two) has
twice as many zeros as a million, and a trillion (tri meaning three)
has three times as many zeros as a million, etc. But the scientific
community seems to use the American system, so if you’re not a politician
and interested in FLOPS, then a trillion FLOPS means you can do 1,000,000,000,000
Floating Point Operations Per Second—that’s a lot.
SGI’s Columbia |
But what if you could do even more? Up until a few weeks
ago the most anyone could do was being done in a giant house in Yokohama,
Japan, on a giant machine built by NEC for the Japanese government and
appropriately called The Earth Machine. The Earth Machine held the world’s
record for FLOPS for a couple of years, maxing out finally at 35.86
TFLOPS.
Pause for a moment and try to consider that. Your super-duper
4.2-GHz P4 that’s keeping the side of your leg warm, on a good day with
favorable sunspots and a tuned compiler might reach 660 (240 sustained
big matrix size) MFLOPS (http://www.tech-report.com/reviews/2001q3/pentium4-2ghz/index.x?pg=5).
Now that’s pretty damn impressive by itself, ’cause lord knows you need
that kind of horsepower to run Word and IE (but not Mozilla). So if
we said it was 359 MFLOPS then if you and all your friends in the world,
all your family members, everyone you work with and everyone who went
to any school with you, all had 4.2-GHz P4s, you still wouldn’t match
the horsepower of The Earth Machine.
The Earth Machine—that’s sooo yesterday: 35.86 TFLOPS—poo.
Real super-duper computers do 51.87 TFLOPS, like SGI’s Columbia they
built for NASA with 10,240 Intel Itanium 2 processors—now that’s
some TFLOPS. (There’s a great story about this machine at http://www.sgi.com/features/2004/oct/columbia/columbia_pg2.html.)
Spiders protecting the DoomGene machine |
But if you want into the big league, then you have to
go blue—BlueGene that is, son. Now we’re talking: 70.72 TFLOPS
as of last week—2X what that tired old Earth Machine can do. And
here’s how proud IBM is about it: you can’t find a damn thing on their
web page about it. If you really dig you can find a brief mention at
http://www.research.ibm.com/bluegene/index.html. Now if I was running
IBM (they ask me all the time, but I keep turning them down), I’d be
taking out full-page color ads. IBM started this project in 1999 and
funded it with $100 million—before any government contracts or
grants. The full BlueGene/L machine is being built for the Lawrence
Livermore National Laboratory in California and will have a peak speed
of 360 teraflops by 2008 or sooner.
IBM’s system uses more than 32,000 embedded processors
designed for low power and fast, on-chip data movement, whereas SGI
has built fast interconnections between more than 10,000 Intel Itanium
processors.
Yes, folks, after two years of losing the technical lead
in the supercomputing race, U.S. manufacturers reclaimed pre-eminence
in the Super-Duper field last week, as systems designed by IBM and SGI
for government contracts were named the world’s fastest at the Pittsburgh
Super Computer conference.
But many computer scientists are concerned that U.S.
supercomputing is in danger of slipping behind again because the government
isn’t investing enough in the field. Well, we’re busy, y’know, with
other things right now.
And guess what? GPUs and VPUs, that’s what. Yep, once
again graphics will save the day. GPUs are clocking in at 76 BFLOPS.
Take 10,000 of them, tightly couple them, and you have a machine potentially
capable of 760 TFLOPS—2X BlueGene and at a fraction of the cost
of BlueGene, Columbia, or The Earth Machine. And think of how fast “Doom3”
would run on that puppy. We could call it “DoomGene.”