START TYPING KEYWORDS TO SEARCH OUR WEBSITE

Famous graphics chips—multi GPUs

The promise and failure of scaling add-in boards

Posted: By Jon Peddie 09.24.20

When 3D graphics controllers were just immerging in the late 1990s, one company in particular, 3Dfx, experimented with ways to scale up the performance of accelerating the 3D gameplay. Their idea was scan-line interleave (SLI), introduced in 1998 as part of their second-generation chip introduction, Voodoo2 (or Voodoo2). In SLI mode, two Voodoo2 add-in-boards (AIBs) could run in parallel, with each one drawing every other line of the display. The original Voodoo Graphics also had SLI capability but was only used in the arcade and professional markets.

In addition to theoretically reducing the scan time, it also increased the available frame buffer’s memory size. This allowed bigger models to be loaded and it also increased the maximum screen resolution. However, the texture memory was not doubled because each AIB needed to duplicate the scene data, and that combined with other overhead issues, depreciated the theoretical performance. As 3D models and screen resolutions got larger, so did the size and number of texture maps further depreciating the proposed benefits.

3Dfx tried to overcome this problem by adding a third chip, the texture mapping unit (TMU). The TMU allowed a second texture to be drawn during the same graphics engine pass with no performance penalty. At the time of its introduction, Voodoo2 was the only 3D AIB capable of single-cycle dual-texturing. Usage of the Voodoo2's second TMU depended on the application software, however, two very popular games of the time, Quake II and Unreal exploited dual-texturing with great success. In fact, in 1998 the multi-textures was almost the standard.

Voodoo2 in SLI configuration (Source: Martín Gamero Prieto)

 

It took a little while before the price-performance analysis showed up. An 8MB Voodoo2 AIB sold for $249 in 1998, about $480 today. So, two Voodoo2 AIBs would be about $500 then. The average performance improvement, however, was about 60 to 70% depending upon the game and CPU. So, the payoff was never there, nor could it ever be. But SLI had something more valuable—sex appeal. 

When Nvidia bought 3Dfx’s assets in 2000, included in the IP package was SLI. However, Nvidia didn’t (re)introduce it until 2004 because there weren’t motherboards with two AGP port. And, being Nvidia, they rebranded it to a scan-line interface. Nvidia also expanded the concept making it capable of using up to four AIBs (which 3dfx had done in the professional space with its Quantum3D products). And they added multiple modes: Split-frame rendering (half per AIB), alternate frame rendering, and even SLI anti-aliasing as well as the ability to use an integrated GPU, a mode they called Hybrid SLI.

But expansion and rebranding couldn’t change SLI’s basic functionality and it never delivered anything more than 170% improvement for 200% the costs; and AIBs were increasing in price yearly. In addition, the driver support Nvidia had to provide, amounting to a tweak for almost every game, we're adding up with each new generation. But the concept still had sex appeal.

In late 2005, reacting to Nvidia’s promotion of SLI, AMD, who had just acquired ATI, introduced its own version called CrossFire. Then in 2013 AMD took the concept to the next level and eliminated the over-the-top (OTT) strap. Instead, they used an extended direct memory access (XDMA) to open a direct channel of communication between the multiple GPUs in a system, connected via the PCI Express interface.

AMD’s XDMA eliminated the external bridge by opening a direct channel of communication between the multiple GPUs in a system. That channel operated over the same PCI Express (PCIe) interface as the AMD AIBs. PCIe is normally used to transfer graphics data between GPUs, main memory, and CPU. When AMD introduced XDMA, the AIBs of the time were not using all the bandwidth PCIe offered which was considerably more than an OTT strap could. The bandwidth of an external OTT bridge was only 900MB/s, whereas PCIe 3.0 with 16 lanes could provide up to 32GB/s. 

In 2017, as AMD and Nvidia rolled out Dx12 AIBs, AMD dropped support for CrossFire and said, “In DirectX 12, we reference multi-GPU as applications must support mGPU, whereas AMD has to create the profiles for DX11. We’ve accordingly moved away from using the CrossFire tag for multi-GPU gaming.”AMD’s added bandwidth and elimination of the OTT (which in latter days Nvidia began charging extra for) gave the company a competitive advantage. However, its AIBs of the time we not of the same performance level as Nvidia’s and so that didn’t help them much in the marketplace. Ironically, when AMD introduced the RX480 in 2016, the company suggested users buy two AMD AIBs which AMD said would outperform one Nvidia AIB and cost less. It was a clever marketing pitch but it didn’t help AMD’s sales. It also wasn’t true.

Nvidia followed suit in 2019 and made it official in 2020. For its professional graphics AIB line, Quadro, Nvidia introduced a newer, much higher bandwidth scheme it calls NVLink for multi-AIBs. NVLink specifies a point-to-point connection with data rates of 20, 25 and 50 Gbit/s.

In late 2020 the company introduced a high-end consumer AIB, the RTX3090, and made NVLink an option for it. The 350-watt RTX 3090 was introduced at $1,499. Nvidia made bridges for it for $80. AIB partners were also able to make their own bridges.

Physical compatibility depends on the industrial design, but the Nvidia direct bridge should work with any two identical cards from Nvidia or AIB partners.

It’s not likely very many gamers will spend $3,000 plus another approximately $90 for the NVLink, and will maybe need to add a larger power supply (PSU) for the added performance. However, content creators might.

If like many you’re fascinated and curious about 3dfx and all the things they innovated (like SLI) then we recommend The Legacy of 3Dfx by Martin Gamero Prieto.