Nvidia RTX 3070 fills a niche

It seems amusing to describe a midrange AIB that sells for $500 as not too expensive, but that’s how the average selling prices have moved, led by Nvidia. Nvidia is positioning its RTX 3070 as a replacement for the RTX 2070 or RTX 2080 Ti. Nvidia introduced the RTX 2070 in October 2018 at $599, and the RTX 2070 Ti in October 2018 at $599. The company then introduced the RTX 2070 Super in July 2019 for $499. So, there are several choices for generational comparisons.

The RTX 2070 is the latest release of Nvidia’s Ampere-based GPUs. The Ampere architecture, which Nvidia describes as a streaming multiprocessor (SM), is the GPU’s building block and consists of various cores, units, and memory. One of the significant changes in the Ampere architecture SM is the 32-bit floating-point (FP32) throughput. Nvidia has now doubled it. To accomplish this, the company designed a new datapath for FP32 and INT32 operations, which, with all four partitions combined, executes 128 FP32 operations per clock.

The Ampere design incorporates three processor types in one chip. First, there is the programmable shader Nvidia introduced over 15 years ago. The RT Cores are used to accelerate the ray-triangle and ray-bounding-box intersections, and then the AI processing pipeline is called Tensor Core. Each has a separate role to play, as they work in concert. Nvidia has made a nice looking graphic to illustrate this.

Ampere processor family (Source: Nvidia)

The general functionality of the processors is:

Programmable shader: Increased to 2 shader calculations per clock versus 1 on Turing—20.3 Shader-TFLOPS compared to 7.9 TFLOPS.
2nd Generation RT Core: Ray-triangle intersection throughput is now doubled so that the RT Core delivers 39.7 RT-TFLOPs, compared to Turing’s 23.8.
3rd generation Tensor core: new Tensor core automatically identifies and removes less important DNN weights. The new hardware processes the sparse network at twice the rate of Turing—162.6 Tensor-TFLOPS with sparsity compared to Turing’s 63 TFLOPS.

Nvidia employs the tensor cores in its deep-learning, super-sampling (DLSS) technique to accelerate frame rate while improving the visual aspects of the image. Nvidia introduced DLSS in the Turing architecture. It leverages a deep neural network to extract multidimensional features of the rendered scene. It then cleverly combines details from multiple frames to construct a high-quality final image. That image looks comparable to native resolution while delivering higher performance. Essentially, the Tensor Cores allow DLSS to speed up a game, all while providing comparable images. Sometimes, claims Nvidia, even more, detailed images.

Nvidia surprised the world when it introduced its Turing design, which brought real-time ray-tracing to the gaming world. That brought realistic lighting, shadows, and effects to games never seen before. It enhanced image quality and immersion beyond what was imagined, but it didn’t speed up gameplay and was criticized for the cost. Nvidia corrected the performance issue with DLSS.

Nvidia’s claims its second-generation Ampere architecture’s ray-tracing cores double the throughput when compared to Turing’s ray tracing cores. The Ampere architecture RT Core doubles ray-intersection processing. Its ray-tracing is processed concurrently with shading, says Nvidia.

Cooler and quieter. Nvidia says the RTX 3070 flow-through system is up to 16dBA quieter and has 44% higher thermal performance than the RTX 2070 Founders Edition. We tried to test this and could determine any difference.

Nvidia put several other features into the RTX 3070 that we did not test or evaluate, such as the Reflex latency technology, which the company claims lets gamers acquire targets faster, react quicker, and increase aim precision through a suite of new GeForce and G-SYNC technologies. These features optimize and measure system latency in competitive games, says the company. However, it is an SDK that game developers have to incorporate, and several have.

The company has also developed a broadcast encoder for people who stream their own gameplay or others’ game pay. They call it Broadcast. And the company has a whole host of SDKs and other developer tools.

How does it compare?

We tested the RTX 3070 against an RTX 3080 and RTX 2070 Super on a 10th gen i9 system, Core i9-10900K at 3.7GHz. We also ran tests on an AMD Ryzen 9 3900X 12-core.

Nvidia RTX 3080 (top), RTX 2070 Super, and RTX 3070 (Photo credit: Mark Poppin)

For games, we ran Metro Exodus with and without DLSS, Wolfenstein River Lab, and The Shadow of Tomb Raider with and without DLSS.

For synthetic tests, we ran Time Spy and Time Spy: Extreme, Port Royal, Novabench, Crytek Noir, Blender, Bright Memory Infinite, and Boundary for ray tracing.

We took the average fps of all the tests, the score, and the fps and scores for ray tracing. And then, we calculated four Pmark values for each AIB and got the following results

In all cases, the RTX 3070 was the clear winner.

Test results for RTX 3080, 3070, and 2070 Super

The specifications for the AIBs, test results, and Pmarks are shown in the following table.

	RTX 3080	RTX 3070	RTX 2070 Super	3080-3070 % Difference	3070-2070S Difference
Avg. FPS	60.6	46.8	34.0	29%	38%
Avg. score	10894.2	9389.8	7660.8	16%	23%
Avg. RT FPS	33.1	23.8	16.0	39%	48%
Avg. RT score	10178.7	9072.7	7753	12%	17%
GeForce
Release Date	9/2020	10/2020	07/2019
GPU	GA102	GA102	TU-104
Shaders	8704	5888	2560	48%	130%
TMUs	272	184	160	48%	15%
SM	68	40	40	70%	0%
GPU Core Clock MHz	1440	1500	1605	-4%	-7%
Boost clock MHz	1710	1730	1770	-1%	-2%
Process nm	8	8	12	0%	-33%
Transistors (Billions)	28.3	17.4	13.6	63%	28%
Die Size (mm2)	628	392.5	545	60%	-28%
AIB Memory GB	10	8	8	25%	0%
Bus size bits	320	320	256	0%	25%
Bandwidth	760.3	560	448	36%	25%
Memory Speed Gbps	19	14	14	36%	0%
Memory Type	GDDR6X	GDDR6	GDDR6
TFLOPS FP32	29.77	25.61	9.06	16%	183%
Power	320	220	215	45%	2%
Price MSRP	$700	$500	$499	40%	0%
k fps	100000
k score	1000
Pmark fps (all)	27.03	42.54	31.66	-36%	34%
Pmark score (all)	48.63	85.36	71.41	-43%	20%
Pmark fps (RT)	14.79	21.60	14.92	-32%	45%
Pmark score (RT)	45.44	82.48	72.27	-45%	14%
Testing Data and Pmark Results

Test Bed
CPU	Core i9-10900K 3.7GHz
MB	Gigabyte Z490 AORUS Master
SSD	512 Sandisk
HDD	2.5 TB WDC
RAM	16GB
Display	BenQ EL 2870U 27.8
TEST BED

Nvidia says the Ampere architecture compared to Turing is significantly better. “It’s our greatest generational leap,” said the company[CRD1]. “We knew a significant technology advance was needed to inspire content developers to create the next level of content and for the installed base to upgrade.”

So how did they do it? The new flagship Ampere architecture gaming GPU innovates everything invented and introduced in Turing, providing the most significant generational leap in graphics performance. Every aspect of this 2nd generation RTX GPU architecture has been improved says the company.

The results certainly seem to support that claim.

Nvidia RTX 3070 fills a niche

How does it compare?

Related posts

AMD slam-dunks GPGPU test

Dell has 40-inches and 11 million pixels for you

RX 6500 XT: AMD’s first entry-level AIB featuring RDNA 2 technology

Recent products

Nvidia RTX 3070 fills a niche

How does it compare?

Related posts

AMD slam-dunks GPGPU test

Dell has 40-inches and 11 million pixels for you

RX 6500 XT: AMD’s first entry-level AIB featuring RDNA 2 technology

Recent products

2024 Worldwide CAD Report

The Arm IPO—background and possibilities – Predictions, potentials, and pitfalls

TV Gaming Hardware market study – advanced financial modeling of the global TV Gaming Hardware market