Is the third processor the third rail?

By now you have all heard of Ageia, the new company on the scene with a co-processor designed to run physics algorithms. It’s a clever and novel idea, and one that has people excited. It also has its fair share of naysayers. No doubt it’s not going to be an easy road for the company, but it follows interesting and ...

Robert Dow
Center of the Universe

By now you have all heard of Ageia, the new company on the scene with
a co-processor designed to run physics algorithms. It’s a clever and
novel idea, and one that has people excited. It also has its fair share
of naysayers. No doubt it’s not going to be an easy road for the company,
but it follows interesting and successful footsteps of other co- or
third-processor efforts.

The first co-processor in PC land was the 80287 floating-point co-processor
introduced by Intel in 1982, and quickly copied by AMD, Weitek, Chips
& Technologies, and others. In fact, some of the copiers got into business
based on the 287.

That was the beginning of the use of the PC for serious computational
work and not just word-processing. The floating-point function was quickly
assimilated by Pat Gelsinger’s miraculous 386 in 1985, and all the 287
suppliers went looking for something else to do.

The next co-processor that tried to wiggle its way into the PC was
the DSP. Intel actually was one of, if not the, first DSP producers,
with their 2920, which was introduced in 1979, two years before the
x86-based PC. Later Intel introduced the i860, which was to be a graphics
processor optimized RISC processor as a DSP engine. It never got off
the ground and was relegated to an embedded processor in printers.

DSPs were being considered for array processors, audio processors,
floating-point front ends for graphics, and various signal processing
in communications. But it was not to have a permanent home in Intel’s
PC, and the venerable x86 core expanded to take on DSP functions. (Although
you can find DSPs in audio cards and Nvidia’s south bridge, which they
call a MCP.)

We also saw the x86 win the CISC/RISC battle, assimilating RISC functions
and still offering GP utilization such that the RISC builders were forced
into dead-end niches and embedded systems. The x86 with its expanding
pipeline and MMX functions was an unstoppable train that would capture
every segment of computerland—well, almost every segment.

There were still some big problems the x86 wasn’t ready for, like large
datasets and 64-bit addressing, and it -really wasn’t any good at message-passing
multi-processor arrays like the Transputer (although it did get built
into some early hypercubes), and it would never ever be a VLIW device
doing multiple executions in one clock like the Transmeta design and
others. But who cares about those esoteric ninny markets anyway?

And then there were two

While Intel was busy stomping the hell out of any new processor entry
in the computer market, other companies quietly grew the graphics controller
from a simple-minded state machine called a VGA controller to what is
now known as a parallel processor GPU, and it has exceeded the august
x86 in MIPS, FLOPS, transistors, and memory controller bandwidth.

Intel couldn’t stop it; we had to have a display to use a computer,
so they tried to join it. For a variety of reasons their efforts were
feeble, crescendoing with the flop of the i740. Intel’s acquisition
of Chips & Technology for mobile graphics, and Real3D for high-end graphics,
had the same fate (although they contributed mightily to the company’s
IP portfolio), and the company grudgingly accepted the second processor,
a GPU—but not without a fight, and eventually had revenge with
the outrageous success of their integrated graphics chipsets.

Could there be three?

Barely had the GPU begun to come into its own right when others tried
to offer a third processor for hard problems that couldn’t be efficiently
or quickly solved with a CPU and/or a GPU.

Ray tracing

Advanced Rendering Technology (ART) was founded in Cambridge, England,
in March 1995 to develop the world’s first ray-tracing graphics processor—the
AR250. It was hailed as the long-awaited solution to the time-consuming
task of rendering big, beautiful pictures for automotive design and
the movies, which had been using powerful (and powerfully expensive)
SGI workstations, or farms of less expensive Sun workstations.

Alas it was, and still is, a specialty market, and so the AR250 never
became mainstream or found a home in the PC. Nonetheless, ART carried
on, put four AR250s on a card and several cards in a chassis, and developed
an incredibly cost-effective render farm in a box. The chip has since
been updated to the AR350, and ART, I’m happy to report, is doing damn
well and has shipped hundreds of their systems to aerospace, automotive,
and movie studios, most with a PC on the side.

In Germany in 2003, inTrace GmbH, a spin-off company from Saarland
University, SaarbrŸcken, introduced their real-time ray-tracing technology,
which was translated into a FPGA and shown at CeBit in 2004. The chip
and software is already used in the automotive industry by companies
such as Audi, BMW, DaimlerChrysler, and Volks-wagen. Volkswagen has
recently even built a large visualization center on this new technology.
However, it remains to be seen if inTrace will have any more success
at becoming the third processor than ART has, and most likely inTrace
will follow in ART’s footsteps.


In 1998 Mitsubishi Electric Research Laboratory (MERL) developed the
Volume-Pro real-time volume rendering chip, the vg500. The chip was
realized and we all thought MERL was going to make it, if not in the
PC, at least in workstations as a third processor.

The VolumePro technology, however, was sold in 2001 to TeraRecon, where
it’s being put to work interpreting seismic and medical data in a dedicated
workstation (not too unlike the ART story), and also doing quite well,
recently reporting a 100% increase in sales for their Aquarius workstation,
which is being used in 3D medical imaging and is capable of manipulating
large thin-slice datasets generated from modern Multi-Detector CT and
MRI scanners. That, in turn, has fueled the rapid adoption of Aquarius
by users of MDCT and Picture Archiving and Communications Systems (PACS)
systems. So no third processor slot, but no failure either.


For a while, mostly in Colorado, there was serious discussion and planning
for a NURBS hardware accelerator. However, a combination of the maturing
of the CAD market and the crash of the Internet bubble put those plans
and discussion to rest.

Could the world benefit from a dedicated NURBS processor? Sure. Could
you get the ISVs and APIs to enable it? Probably not, just not enough
sales volume for the effort (although OpenGL may have extensions hidden
in some register/instruction somewhere for it -already).

Image processing

Image processing, as done by video editors, medical analysis, satellite
and reconnaissance interpretation, and other 2D pixel manipulations,
is a tedious, repetitious operation. Image processing can bring a 2.8-GHz
P4 to a grinding miserable recursive loop that seems like it’s never
going to end, and forget about using the computer for anything else
in the meantime.

Companies have developed array processors, often using DSPs, to solve
this problem, and within the 3Dlabs VP990 lurks the vestiges of an array
processor. One can also be found in Atsana and NeoMagic’s chips.

Aspex, a U.K. company founded in 1999, has also built an architecture
that is based on the replication of thousands (4,096) of simple processors
interconnected by a flexible communications network to form a ÒdeepÓ
SIMD structure, which they call the Linedancer. It’s an amazing bit
of technology, but it will not become the third processor in the PC,
although it will get hung on a PC much like the voxel and ray-tracing
proc-essors have.


Like a GP math co-processor, which is what the earlier floating-point
processors (x287) were called, physics is general-purpose enough that
it might have a shot at being the third processor. It’s not for everyone,
but there are enough people doing work involving physics as well as
the entertainment industry (movies and games) to make it interesting.

It’s early days for Ageia and they’re still busy doing their homework,
getting API and ISV acceptance, and testing their FPGAs. They haven’t
even disclosed (publicly) how many floating-point units, word lengths,
or clock speeds they have yet, so to write them off as just another
esoteric attempt to augment the esteemed x86, with or without dual cores,
is much too premature.

But their emergence on the PC scene does beg the question: can the
PC support a third processor, does the PC need a third processor? There’s
no single application that is so horizontal, like graphics, for which
a simple majority of users need a special accelerator.

Having said that, then one has to look for things that the x86 can’t
do well, and that enough people want to do that dedicated acceleration
is needed. Looking at the options in the figure, physics seems the best


If we can genuinely make use of a physics accelerator, could we also
develop a dedicated artificial intelligence accelerator? That’s an intriguing
idea—think of the actor actions and reactions of a really smart
AI character or gang of AI characters.

But it’s difficult (at least to my thinking) to separate AI from physics.
Sure you can have simple attraction/avoidance functions, but what happens
when the AI character actually gets in play? Does he/she run, fall,
hide, shoot, ride away or toward you? That’s all physics, isn’t it?
That’s why Natural Motions’s stuff is so appealing.

So maybe the future of the third proc-essor isn’t physics, maybe it’s
AI and physics. Now there’s something to think about.