Is the third processor the third rail?

Posted: 05.23.05
Center of the Universe

By now you have all heard of Ageia, the new company on the scene with a co-processor designed to run physics algorithms. It's a clever and novel idea, and one that has people excited. It also has its fair share of naysayers. No doubt it's not going to be an easy road for the company, but it follows interesting and successful footsteps of other co- or third-processor efforts.

The first co-processor in PC land was the 80287 floating-point co-processor introduced by Intel in 1982, and quickly copied by AMD, Weitek, Chips & Technologies, and others. In fact, some of the copiers got into business based on the 287.

That was the beginning of the use of the PC for serious computational work and not just word-processing. The floating-point function was quickly assimilated by Pat Gelsinger's miraculous 386 in 1985, and all the 287 suppliers went looking for something else to do.

The next co-processor that tried to wiggle its way into the PC was the DSP. Intel actually was one of, if not the, first DSP producers, with their 2920, which was introduced in 1979, two years before the x86-based PC. Later Intel introduced the i860, which was to be a graphics processor optimized RISC processor as a DSP engine. It never got off the ground and was relegated to an embedded processor in printers.

DSPs were being considered for array processors, audio processors, floating-point front ends for graphics, and various signal processing in communications. But it was not to have a permanent home in Intel's PC, and the venerable x86 core expanded to take on DSP functions. (Although you can find DSPs in audio cards and Nvidia's south bridge, which they call a MCP.)

We also saw the x86 win the CISC/RISC battle, assimilating RISC functions and still offering GP utilization such that the RISC builders were forced into dead-end niches and embedded systems. The x86 with its expanding pipeline and MMX functions was an unstoppable train that would capture every segment of computerland—well, almost every segment.

There were still some big problems the x86 wasn't ready for, like large datasets and 64-bit addressing, and it -really wasn't any good at message-passing multi-processor arrays like the Transputer (although it did get built into some early hypercubes), and it would never ever be a VLIW device doing multiple executions in one clock like the Transmeta design and others. But who cares about those esoteric ninny markets anyway?

And then there were two

While Intel was busy stomping the hell out of any new processor entry in the computer market, other companies quietly grew the graphics controller from a simple-minded state machine called a VGA controller to what is now known as a parallel processor GPU, and it has exceeded the august x86 in MIPS, FLOPS, transistors, and memory controller bandwidth.

Intel couldn't stop it; we had to have a display to use a computer, so they tried to join it. For a variety of reasons their efforts were feeble, crescendoing with the flop of the i740. Intel's acquisition of Chips & Technology for mobile graphics, and Real3D for high-end graphics, had the same fate (although they contributed mightily to the company's IP portfolio), and the company grudgingly accepted the second processor, a GPU—but not without a fight, and eventually had revenge with the outrageous success of their integrated graphics chipsets.

Could there be three?

Barely had the GPU begun to come into its own right when others tried to offer a third processor for hard problems that couldn't be efficiently or quickly solved with a CPU and/or a GPU.

Ray tracing

Advanced Rendering Technology (ART) was founded in Cambridge, England, in March 1995 to develop the world's first ray-tracing graphics processor—the AR250. It was hailed as the long-awaited solution to the time-consuming task of rendering big, beautiful pictures for automotive design and the movies, which had been using powerful (and powerfully expensive) SGI workstations, or farms of less expensive Sun workstations.

Alas it was, and still is, a specialty market, and so the AR250 never became mainstream or found a home in the PC. Nonetheless, ART carried on, put four AR250s on a card and several cards in a chassis, and developed an incredibly cost-effective render farm in a box. The chip has since been updated to the AR350, and ART, I'm happy to report, is doing damn well and has shipped hundreds of their systems to aerospace, automotive, and movie studios, most with a PC on the side.

In Germany in 2003, inTrace GmbH, a spin-off company from Saarland University, SaarbrŸcken, introduced their real-time ray-tracing technology, which was translated into a FPGA and shown at CeBit in 2004. The chip and software is already used in the automotive industry by companies such as Audi, BMW, DaimlerChrysler, and Volks-wagen. Volkswagen has recently even built a large visualization center on this new technology. However, it remains to be seen if inTrace will have any more success at becoming the third processor than ART has, and most likely inTrace will follow in ART's footsteps.


In 1998 Mitsubishi Electric Research Laboratory (MERL) developed the Volume-Pro real-time volume rendering chip, the vg500. The chip was realized and we all thought MERL was going to make it, if not in the PC, at least in workstations as a third processor.

The VolumePro technology, however, was sold in 2001 to TeraRecon, where it's being put to work interpreting seismic and medical data in a dedicated workstation (not too unlike the ART story), and also doing quite well, recently reporting a 100% increase in sales for their Aquarius workstation, which is being used in 3D medical imaging and is capable of manipulating large thin-slice datasets generated from modern Multi-Detector CT and MRI scanners. That, in turn, has fueled the rapid adoption of Aquarius by users of MDCT and Picture Archiving and Communications Systems (PACS) systems. So no third processor slot, but no failure either.


For a while, mostly in Colorado, there was serious discussion and planning for a NURBS hardware accelerator. However, a combination of the maturing of the CAD market and the crash of the Internet bubble put those plans and discussion to rest.

Could the world benefit from a dedicated NURBS processor? Sure. Could you get the ISVs and APIs to enable it? Probably not, just not enough sales volume for the effort (although OpenGL may have extensions hidden in some register/instruction somewhere for it -already).

Image processing

Image processing, as done by video editors, medical analysis, satellite and reconnaissance interpretation, and other 2D pixel manipulations, is a tedious, repetitious operation. Image processing can bring a 2.8-GHz P4 to a grinding miserable recursive loop that seems like it's never going to end, and forget about using the computer for anything else in the meantime.

Companies have developed array processors, often using DSPs, to solve this problem, and within the 3Dlabs VP990 lurks the vestiges of an array processor. One can also be found in Atsana and NeoMagic's chips.

Aspex, a U.K. company founded in 1999, has also built an architecture that is based on the replication of thousands (4,096) of simple processors interconnected by a flexible communications network to form a ÒdeepÓ SIMD structure, which they call the Linedancer. It's an amazing bit of technology, but it will not become the third processor in the PC, although it will get hung on a PC much like the voxel and ray-tracing proc-essors have.


Like a GP math co-processor, which is what the earlier floating-point processors (x287) were called, physics is general-purpose enough that it might have a shot at being the third processor. It's not for everyone, but there are enough people doing work involving physics as well as the entertainment industry (movies and games) to make it interesting.

It's early days for Ageia and they're still busy doing their homework, getting API and ISV acceptance, and testing their FPGAs. They haven't even disclosed (publicly) how many floating-point units, word lengths, or clock speeds they have yet, so to write them off as just another esoteric attempt to augment the esteemed x86, with or without dual cores, is much too premature.

But their emergence on the PC scene does beg the question: can the PC support a third processor, does the PC need a third processor? There's no single application that is so horizontal, like graphics, for which a simple majority of users need a special accelerator.

Having said that, then one has to look for things that the x86 can't do well, and that enough people want to do that dedicated acceleration is needed. Looking at the options in the figure, physics seems the best bet.


If we can genuinely make use of a physics accelerator, could we also develop a dedicated artificial intelligence accelerator? That's an intriguing idea—think of the actor actions and reactions of a really smart AI character or gang of AI characters.

But it's difficult (at least to my thinking) to separate AI from physics. Sure you can have simple attraction/avoidance functions, but what happens when the AI character actually gets in play? Does he/she run, fall, hide, shoot, ride away or toward you? That's all physics, isn't it? That's why Natural Motions's stuff is so appealing.

So maybe the future of the third proc-essor isn't physics, maybe it's AI and physics. Now there's something to think about.