On GPU computing—it’s not about languages

GPU computing has come into its own this year and is, I think, well understood and accepted by industry and to a large extent many consumers and scientists. Although I still encounter people in all three categories ( poeple who have been using GPGPU, will use it, will never use it) who say, “Huh?” when I speak to them about ...

Robert Dow
Adopters vs Time to Adopt

GPU computing has come into its own this year and is, I think, well understood and accepted by industry and to a large extent many consumers and scientists. Although I still encounter people in all three categories ( poeple who have been using GPGPU, will use it, will never use it) who say, “Huh?” when I speak to them about it.

Aside from that right-hand side of the Gaussian distribution of adopters, the important part is that all of the early adopters get it, many are employing and/or using it, and almost all (with the exception of some notable high visibility status developers) are planning to use it.

The obvious benefit of using a massively parallel processor that is ridiculously inexpensive relative to other less powerful processors, combined with the OS and programming tools support, is almost overwhelming.

And it raises some interesting questions. Even with the ubiquitous amount of vaunted x86 code, cracks in the technology’s monopoly are starting to show. Highly parallel or multi-threaded programs that were written to run on an x86 are now being re-ported to the GPU and the x 86 versions of those programs are being relegated to legacy code and already described as the “dusty decks” of the 21st century.

Intel hopes to blunt this trend by offering a highly parallel x86 chip code named Larrabee, and AMD thinks they might offset it with Fusion. But it’s doubtful Larrabee will ever be able to compete on a FLOPS basis with the GPUs, and Fusion will just be an integrated heterogeneous solution still employing a GPU.

Who is going to use the GPU

The advancement in discrete GPUs is astounding. The current ATI GPU will bring 1,600 32-bit double-precision floating point processors to developers and users with something north of 2.5 TFLOPS for under $500. Nvidia will later in the year offer a GPU with not as many processors but a claim to richer implementation of IEEE 754 floating point specifications, and also in the $500 price range. Furthermore these massively parallel processors will have a minimum of 1GB of tightly coupled GDDR5 memory running at outrageous speeds, almost double that of the DDR3 system memory of conventional x86 processors used in PCs and servers.

But let’s not get taken away with the numbers. Remember, the GPUs will only run vector-based SIMD code. Parallel versions of the x86 such as Larrabee will be able to do that and run the more complex MIMD code sets. And the x86 is a 64-bit double-precision floating point processor with IEEE 754 support, plus a small, dedicated SIMD engine, and running at speeds greater than 3 GHz. For single stream, (currently) up to eight multi-stream, and MIMD applications with huge datasets, it will be difficult to beat the venerable x86.

Winners and losers

And who wants to anyway—if it ain’t broke don’t fix it. The GPU isn’t going to make Excel or Word, or your browser run any faster. So contrary to the desires of the press and fund managers there won’t be a winner and a loser with the expansion of the GPU for computing operations. The press and fund managers bring a sensationalist football game mentality to technology. Rather than invest the time to understand it, they want it to be a repeat of the CD killing 8-track, or Blu-ray defeating HD-DVD. And in all fairness, it’s not like there aren’t plenty of examples to call on, but this time it’s different. In the case of GPU compute, it’s an augmentation not a replacement, and that really confounds the press and investors—neither think they can make any money on that kind of development.

It also seems to confuse the players, Intel sees the discrete GPU as a threat, and the GPU suppliers seem hell bent on proving the x86 is a waste of silicon. Fact is they need each other and if we’re going to move the industry forward they’ve got to stop wasting time sniping at each other.

At IBC last year I surveyed most of the software application and tools companies and asked if they were using GPU for compute, if not did they plan to, and if not why? A very small number said they were, a larger number said they were planning, and about a third said no, never.

This year almost everyone reported they were or were planning to use the GPU for compute in some part of their application or pipeline, and several had examples running. They were showing speedup of 10x to 200x over just an x86 CPU on some functions. Matrox and a couple of other companies that don’t have a GPU were showing acceleration using FPGAs.

The time has come

The point to be understood by the press, investors, and the players, is that we now have an environment where the OS can do a pretty good job of load balancing jobs to a heterogeneous group of processors. And these heterogeneous processors can do certain jobs better making the overall results a faster operation. Not only that, with the different processors doing more specific jobs for which they are better suited, each processor is now offloaded and allowed to apply more resources to the things it does best—that’s a win-win situation.

And that’s where we are now—GPU computing has arrived, it’s never going to go away, applications that can be multi-threaded will take advantage of it, and we users will benefit greatly from it. Debates will continue about what programming paradigm to use, Cuda, OpenCL, or DirectX11 compute. If you’re not personally doing the programming this is a useless debate—you might as well invest your time in fantasy football. The choice of programming tools is a marketing debate between the GPU suppliers, and other than your team loyalty burden it has no impact on you personally. Each side will tell you how their programming tools are better and how the other team can’t do certain compute functions. You know what? You don’t care. You’re just a simple user and when and if benchmarks are ever available, you’ll see very little variation because of those programming differences. It’s the heterogeneous processing paradigm that’s making all the difference.

GPU compute is being applied to scientific, academic, and commercial applications, as well as some video-based consumer applications. So soon GPU computing will be used so widely and transparently that in a year’s time it won’t even be discussed—welcome to the future.