Nvidia, CUDA, and x86

Nvidia has announced its CUDA on x86 technology. Nvidia is enabling developers to move to CUDA gradually. They can develop, debug, and run code on x86 and get started on CUDA even if most of their legacy code is still based on the CPU. By letting developers work with both CPUs and GPUs, Nvidia might also be giving them a chance to see for themselves the advantages of GPUs compared to CPUs.

Robert Dow

It’s not what you’re thinking. Announced last week at the company’s Graphics Technology Conference (GTC), “CUDA on x86” has nothing to do with Nvidia getting its hands on x86 IP, a notion that’s been the fodder for constant speculation over the past few years. Rather, what CUDA-on-x86 accomplishes is the ability to run CUDA-written applications on x86 hardware. The cross-compiler was the result of a collaboration between Nvidia and PGI (Portland Graphics).

If one’s knee were jerking, one might assume such a thing would run counter to Nvidia’s long-term strategy. The company’s in the midst of trying to get applications off CPUs and onto GPUs, so why would the company help developers port their previously Nvidia-locked CUDA apps to x86?

Well, this move isn’t so much aimed at existing CUDA developers, but new ones.

The stated motivation for CUDA on x86, according to Nvidia, is to help ease the burden on developers looking to make a jump to CUDA. The compute infrastructure of interested developers might be light on GPUs but heavy on x86 servers. Because of their legacy code, many developers could have the goal of building up GPGPU supercomputing gradually. CUDA x86 gives developers the bridge to make the transition to CUDA easier. By letting developers develop, debug and run on x86 early on, or as necessary as a secondary platform, they might be more inclined to allocate the resources to get CUDA porting done.

But there also might be a nice marketing angle for Nvidia at work here. As a bridge to get more developers to move gradually to GPUs, CUDA is simply supposed to work reliably on x86, not perform optimally. So, just as an example, there’ll be no need for CUDA x86 to take advantage of the new 2X floating-point SIMD AVX support on Intel’s Sandy Bridge, a feature that might substantially improve the application’s performance on x86. (Now, we’re not saying CUDA x86 won’t support AVX, we’re simply saying that doing so would be outside of the technology’s stated goals).

Ultimately, here’s the bottom line: CUDA on x86 is going to be slower than an application optimized to run on x86 without CUDA, probably a lot slower. So a developer running a CUDA application on x86 and then on Fermi is going to see a larger speed-up than he might otherwise have had had he first optimized on a conventional, non-CUDA x86 platform. Bigger speedup numbers serve Nvidia’s purposes of showcasing how much faster GPUs are than CPUs on many floating-point intensive applications.

What hypothetically might have been a 10X increase moving to GPUs might turn into 100X. And that not only provides more motivation for that particular developer to hustle up and get on Nvidia hardware, it’s great ammunition for Nvidia’s marketing machine to use in furthering its GPU Compute campaign. So while at first glance one might question the wisdom of allowing CUDA apps to cross over to x86, upon further inspection I think quite the opposite — for reasons both stated and not.