It seems amusing to describe a midrange AIB that sells for $500 as not too expensive, but that’s how the average selling prices have moved, led by Nvidia. Nvidia is positioning its RTX 3070 as a replacement for the RTX 2070 or RTX 2080 Ti. Nvidia introduced the RTX 2070 in October 2018 at $599, and the RTX 2070 Ti in October 2018 at $599. The company then introduced the RTX 2070 Super in July 2019 for $499. So, there are several choices for generational comparisons.
The RTX 2070 is the latest release of Nvidia’s Ampere-based GPUs. The Ampere architecture, which Nvidia describes as a streaming multiprocessor (SM), is the GPU’s building block and consists of various cores, units, and memory. One of the significant changes in the Ampere architecture SM is the 32-bit floating-point (FP32) throughput. Nvidia has now doubled it. To accomplish this, the company designed a new datapath for FP32 and INT32 operations, which, with all four partitions combined, executes 128 FP32 operations per clock.
The Ampere design incorporates three processor types in one chip. First, there is the programmable shader Nvidia introduced over 15 years ago. The RT Cores are used to accelerate the ray-triangle and ray-bounding-box intersections, and then the AI processing pipeline is called Tensor Core. Each has a separate role to play, as they work in concert. Nvidia has made a nice looking graphic to illustrate this.
Ampere processor family (Source: Nvidia) |
The general functionality of the processors is:
- Programmable shader: Increased to 2 shader calculations per clock versus 1 on Turing—20.3 Shader-TFLOPS compared to 7.9 TFLOPS.
- 2nd Generation RT Core: Ray-triangle intersection throughput is now doubled so that the RT Core delivers 39.7 RT-TFLOPs, compared to Turing’s 23.8.
- 3rd generation Tensor core: new Tensor core automatically identifies and removes less important DNN weights. The new hardware processes the sparse network at twice the rate of Turing—162.6 Tensor-TFLOPS with sparsity compared to Turing’s 63 TFLOPS.
Nvidia employs the tensor cores in its deep-learning, super-sampling (DLSS) technique to accelerate frame rate while improving the visual aspects of the image. Nvidia introduced DLSS in the Turing architecture. It leverages a deep neural network to extract multidimensional features of the rendered scene. It then cleverly combines details from multiple frames to construct a high-quality final image. That image looks comparable to native resolution while delivering higher performance. Essentially, the Tensor Cores allow DLSS to speed up a game, all while providing comparable images. Sometimes, claims Nvidia, even more, detailed images.
Nvidia surprised the world when it introduced its Turing design, which brought real-time ray-tracing to the gaming world. That brought realistic lighting, shadows, and effects to games never seen before. It enhanced image quality and immersion beyond what was imagined, but it didn’t speed up gameplay and was criticized for the cost. Nvidia corrected the performance issue with DLSS.
Nvidia’s claims its second-generation Ampere architecture’s ray-tracing cores double the throughput when compared to Turing’s ray tracing cores. The Ampere architecture RT Core doubles ray-intersection processing. Its ray-tracing is processed concurrently with shading, says Nvidia.
Cooler and quieter. Nvidia says the RTX 3070 flow-through system is up to 16dBA quieter and has 44% higher thermal performance than the RTX 2070 Founders Edition. We tried to test this and could determine any difference.
Nvidia put several other features into the RTX 3070 that we did not test or evaluate, such as the Reflex latency technology, which the company claims lets gamers acquire targets faster, react quicker, and increase aim precision through a suite of new GeForce and G-SYNC technologies. These features optimize and measure system latency in competitive games, says the company. However, it is an SDK that game developers have to incorporate, and several have.
The company has also developed a broadcast encoder for people who stream their own gameplay or others’ game pay. They call it Broadcast. And the company has a whole host of SDKs and other developer tools.
How does it compare?
We tested the RTX 3070 against an RTX 3080 and RTX 2070 Super on a 10th gen i9 system, Core i9-10900K at 3.7GHz. We also ran tests on an AMD Ryzen 9 3900X 12-core.
Nvidia RTX 3080 (top), RTX 2070 Super, and RTX 3070 (Photo credit: Mark Poppin) |
For games, we ran Metro Exodus with and without DLSS, Wolfenstein River Lab, and The Shadow of Tomb Raider with and without DLSS.
For synthetic tests, we ran Time Spy and Time Spy: Extreme, Port Royal, Novabench, Crytek Noir, Blender, Bright Memory Infinite, and Boundary for ray tracing.
We took the average fps of all the tests, the score, and the fps and scores for ray tracing. And then, we calculated four Pmark values for each AIB and got the following results
In all cases, the RTX 3070 was the clear winner.
Test results for RTX 3080, 3070, and 2070 Super |
The specifications for the AIBs, test results, and Pmarks are shown in the following table.
RTX 3080 | RTX 3070 | RTX 2070 Super | 3080-3070 % Difference | 3070-2070S Difference | |
Avg. FPS | 60.6 | 46.8 | 34.0 | 29% | 38% |
Avg. score | 10894.2 | 9389.8 | 7660.8 | 16% | 23% |
Avg. RT FPS | 33.1 | 23.8 | 16.0 | 39% | 48% |
Avg. RT score | 10178.7 | 9072.7 | 7753 | 12% | 17% |
GeForce | |||||
Release Date | 9/2020 | 10/2020 | 07/2019 | ||
GPU | GA102 | GA102 | TU-104 | ||
Shaders | 8704 | 5888 | 2560 | 48% | 130% |
TMUs | 272 | 184 | 160 | 48% | 15% |
SM | 68 | 40 | 40 | 70% | 0% |
GPU Core Clock MHz | 1440 | 1500 | 1605 | -4% | -7% |
Boost clock MHz | 1710 | 1730 | 1770 | -1% | -2% |
Process nm | 8 | 8 | 12 | 0% | -33% |
Transistors (Billions) | 28.3 | 17.4 | 13.6 | 63% | 28% |
Die Size (mm2) | 628 | 392.5 | 545 | 60% | -28% |
AIB Memory GB | 10 | 8 | 8 | 25% | 0% |
Bus size bits | 320 | 320 | 256 | 0% | 25% |
Bandwidth | 760.3 | 560 | 448 | 36% | 25% |
Memory Speed Gbps | 19 | 14 | 14 | 36% | 0% |
Memory Type | GDDR6X | GDDR6 | GDDR6 | ||
TFLOPS FP32 | 29.77 | 25.61 | 9.06 | 16% | 183% |
Power | 320 | 220 | 215 | 45% | 2% |
Price MSRP | $700 | $500 | $499 | 40% | 0% |
k fps | 100000 | ||||
k score | 1000 | ||||
Pmark fps (all) | 27.03 | 42.54 | 31.66 | -36% | 34% |
Pmark score (all) | 48.63 | 85.36 | 71.41 | -43% | 20% |
Pmark fps (RT) | 14.79 | 21.60 | 14.92 | -32% | 45% |
Pmark score (RT) | 45.44 | 82.48 | 72.27 | -45% | 14% |
Testing Data and Pmark Results |
Test Bed | |
CPU | Core i9-10900K 3.7GHz |
MB | Gigabyte Z490 AORUS Master |
SSD | 512 Sandisk |
HDD | 2.5 TB WDC |
RAM | 16GB |
Display | BenQ EL 2870U 27.8 |
TEST BED |
Nvidia says the Ampere architecture compared to Turing is significantly better. “It’s our greatest generational leap,” said the company[CRD1]. “We knew a significant technology advance was needed to inspire content developers to create the next level of content and for the installed base to upgrade.”
So how did they do it? The new flagship Ampere architecture gaming GPU innovates everything invented and introduced in Turing, providing the most significant generational leap in graphics performance. Every aspect of this 2nd generation RTX GPU architecture has been improved says the company.
The results certainly seem to support that claim.