Reviews

Nvidia RTX 3070 fills a niche

Lots of power not too expensive

Robert Dow

 

It seems amusing to describe a midrange AIB that sells for $500 as not too expensive, but that’s how the average selling prices have moved, led by Nvidia. Nvidia is positioning its RTX 3070 as a replacement for the RTX 2070 or RTX 2080 Ti. Nvidia introduced the RTX 2070 in October 2018 at $599, and the RTX 2070 Ti in October 2018 at $599. The company then introduced the RTX 2070 Super in July 2019 for $499. So, there are several choices for generational comparisons.

The RTX 2070 is the latest release of Nvidia’s Ampere-based GPUs. The Ampere architecture, which Nvidia describes as a streaming multiprocessor (SM), is the GPU’s building block and consists of various cores, units, and memory. One of the significant changes in the Ampere architecture SM is the 32-bit floating-point (FP32) throughput. Nvidia has now doubled it. To accomplish this, the company designed a new datapath for FP32 and INT32 operations, which, with all four partitions combined, executes 128 FP32 operations per clock.

The Ampere design incorporates three processor types in one chip. First, there is the programmable shader Nvidia introduced over 15 years ago. The RT Cores are used to accelerate the ray-triangle and ray-bounding-box intersections, and then the AI processing pipeline is called Tensor Core. Each has a separate role to play, as they work in concert. Nvidia has made a nice looking graphic to illustrate this.

Ampere processor family (Source: Nvidia)

 

The general functionality of the processors is:

  • Programmable shader: Increased to 2 shader calculations per clock versus 1 on Turing—20.3 Shader-TFLOPS compared to 7.9 TFLOPS.
  • 2nd Generation RT Core: Ray-triangle intersection throughput is now doubled so that the RT Core delivers 39.7 RT-TFLOPs, compared to Turing’s 23.8.
  • 3rd generation Tensor core: new Tensor core automatically identifies and removes less important DNN weights. The new hardware processes the sparse network at twice the rate of Turing—162.6 Tensor-TFLOPS with sparsity compared to Turing’s 63 TFLOPS.

Nvidia employs the tensor cores in its deep-learning, super-sampling (DLSS) technique to accelerate frame rate while improving the visual aspects of the image. Nvidia introduced DLSS in the Turing architecture. It leverages a deep neural network to extract multidimensional features of the rendered scene. It then cleverly combines details from multiple frames to construct a high-quality final image. That image looks comparable to native resolution while delivering higher performance. Essentially, the Tensor Cores allow DLSS to speed up a game, all while providing comparable images. Sometimes, claims Nvidia, even more, detailed images.

Nvidia surprised the world when it introduced its Turing design, which brought real-time ray-tracing to the gaming world. That brought realistic lighting, shadows, and effects to games never seen before. It enhanced image quality and immersion beyond what was imagined, but it didn’t speed up gameplay and was criticized for the cost. Nvidia corrected the performance issue with DLSS.

Nvidia’s claims its second-generation Ampere architecture’s ray-tracing cores double the throughput when compared to Turing’s ray tracing cores. The Ampere architecture RT Core doubles ray-intersection processing. Its ray-tracing is processed concurrently with shading, says Nvidia.

Cooler and quieter. Nvidia says the RTX 3070 flow-through system is up to 16dBA quieter and has 44% higher thermal performance than the RTX 2070 Founders Edition. We tried to test this and could determine any difference.

Nvidia put several other features into the RTX 3070 that we did not test or evaluate, such as the Reflex latency technology, which the company claims lets gamers acquire targets faster, react quicker, and increase aim precision through a suite of new GeForce and G-SYNC technologies. These features optimize and measure system latency in competitive games, says the company. However, it is an SDK that game developers have to incorporate, and several have.

The company has also developed a broadcast encoder for people who stream their own gameplay or others’ game pay. They call it Broadcast. And the company has a whole host of SDKs and other developer tools.

How does it compare?

We tested the RTX 3070 against an RTX 3080 and RTX 2070 Super on a 10th gen i9 system, Core i9-10900K at 3.7GHz. We also ran tests on an AMD Ryzen 9 3900X 12-core.

Nvidia RTX 3080 (top), RTX 2070 Super, and RTX 3070 (Photo credit:  Mark Poppin)

 

For games, we ran Metro Exodus with and without DLSS, Wolfenstein River Lab, and The Shadow of Tomb Raider with and without DLSS.

For synthetic tests, we ran Time Spy and Time Spy: Extreme, Port Royal, Novabench, Crytek Noir, Blender, Bright Memory Infinite, and Boundary for ray tracing.

We took the average fps of all the tests, the score, and the fps and scores for ray tracing. And then, we calculated four Pmark values for each AIB and got the following results

In all cases, the RTX 3070 was the clear winner.

Test results for RTX 3080, 3070, and 2070 Super

 

The specifications for the AIBs, test results, and Pmarks are shown in the following table.

  RTX 3080 RTX 3070 RTX 2070 Super 3080-3070 % Difference 3070-2070S Difference
Avg. FPS 60.6 46.8 34.0 29% 38%
Avg. score 10894.2 9389.8 7660.8 16% 23%
Avg. RT FPS 33.1 23.8 16.0 39% 48%
Avg. RT score 10178.7 9072.7 7753 12% 17%
GeForce          
Release Date 9/2020 10/2020 07/2019    
GPU GA102 GA102 TU-104    
Shaders 8704 5888 2560 48% 130%
TMUs 272 184 160 48% 15%
SM 68 40 40 70% 0%
GPU Core Clock MHz 1440 1500 1605 -4% -7%
Boost clock MHz 1710 1730 1770 -1% -2%
Process nm 8 8 12 0% -33%
Transistors (Billions) 28.3 17.4 13.6 63% 28%
Die Size (mm2) 628 392.5 545 60% -28%
AIB Memory GB 10 8 8 25% 0%
Bus size bits 320 320 256 0% 25%
Bandwidth 760.3 560 448 36% 25%
Memory Speed Gbps 19 14 14 36% 0%
Memory Type GDDR6X GDDR6 GDDR6    
TFLOPS FP32 29.77 25.61 9.06 16% 183%
Power 320 220 215 45% 2%
Price MSRP $700 $500 $499 40% 0%
k fps 100000        
k score 1000        
Pmark fps (all) 27.03 42.54 31.66 -36% 34%
Pmark score (all) 48.63 85.36 71.41 -43% 20%
Pmark fps (RT) 14.79 21.60 14.92 -32% 45%
Pmark score (RT) 45.44 82.48 72.27 -45% 14%
Testing Data and Pmark Results 

 

Test Bed  
CPU Core i9-10900K 3.7GHz
MB Gigabyte Z490 AORUS Master
SSD 512 Sandisk
HDD 2.5 TB WDC
RAM 16GB
Display BenQ EL 2870U 27.8
TEST BED

 

Nvidia says the Ampere architecture compared to Turing is significantly better. “It’s our greatest generational leap,” said the company[CRD1]. “We knew a significant technology advance was needed to inspire content developers to create the next level of content and for the installed base to upgrade.”

So how did they do it? The new flagship Ampere architecture gaming GPU innovates everything invented and introduced in Turing, providing the most significant generational leap in graphics performance. Every aspect of this 2nd generation RTX GPU architecture has been improved says the company.

The results certainly seem to support that claim.