AMD’s Polaris generation comes to Radeon Pro
Alex Herrera on December 1st 2016 |
The progression of GPU products for workstations has for years followed a fairly predictable pattern. A new architectural generation appears every 18 months or so, coinciding with the launch of a family of chips targeting gaming markets (Radeon for AMD and GeForce for Nvidia). Sometime later, anywhere from one to three quarters, the same family of chips will emerge in the workstation-focused product lines (Radeon Pro for AMD and Quadro for Nvidia). The lag time is both justified by the extra work (e.g. drivers and certification) and tolerated by a market that doesn’t necessarily have to be bleeding edge.
It’s a sequence common to both GPU vendors, but the two have in recent years differed on where in the marketplace the first add-in card SKUs appear. Nvidia tends to drive a new GPU generation into its top-end SKUs first, just as it’s done this quarter with the new Pascal-based Quadro P5000 and P6000 (around $2,500 and $4,000 respectively). By contrast, we often see the first Radeon Pro (formerly FirePro) SKUs in the mid-range or even entry level. And that’s precisely the model AMD has followed now with its latest Polaris generation, launching three new workstation GPUs in the Radeon Pro WX 7100, WX 5100 and WX 4100.
Mid-range and Entry segments represented
Bearing street prices (JPR estimated) of around $260 and $325, respectively, the Radeon Pro WX 4100 and WX 5100 qualify as entry-class workstation GPUs, while the $625 WX 7100 fits in the middle of JPR’s mid-range. The WX 7100 and WX 5100 are built around the bigger 2304 processing core incar-nation of Polaris (the Polaris 10 GPU, first released as the Radeon RX 480), while the WX 4100 is derived from the smaller 1024 core Polaris 11 chip.
All of them leverage Polaris features and architecture, but each comes with different settings for the three major dials vendors use to discriminate between price tiers: the number of enabled processing cores, memory size and memory bandwidth. The WX 7100 offers 2304 GCN stream processor cores (peak 5.7 TFLOPS) and 8 GB of memory, accessed by 224 GB/s bandwidth (also peak). Taking steps down from there, the WX 5100 and WX 4100 offer 1792 and 1024 stream processors, along with 8 and 4 GB of memory (respectively), while both can sustain 160 GB/s peak bandwidth.
Both the Radeon Pro WX 5100 and WX 7100 are full-height PCI Express cards, supporting conventional deskside tower workstations. The WX 4100, however, is a half-height card following on the heels of the previous FirePro W4300 and targeting the hottest segment of the fixed workstation market: the small form factor (SFF) workstation. While the slimmer WX 4100 can fit in a micro and mini towers (AMD provides and full-height bracket), the WX 5100 and WX 7100 are too tall to fit in the majority of small form factor workstations. The only exception among top-tier OEM offerings is Fujitsu’s new Cel-sius J550, which provides a unique riser card to accommodate full-height add-ins. Complete product specs for the new Radeon Pro trio follow.
What Polaris brings to workstations
As was the case with its predecessors, the Polaris generation relies on the same basic shader array architecture introduced with 1st generation GCN (“Graphics Core Next”) Southern Islands. The “atomic” processing element that forms the foundation of the architecture is the Compute Unit or CU, pictured below, built from 64 stream processing cores and supporting instruction unit, ALUs (vector and scalar), registers, and L1 cache. The first major incarnation of Polaris was the Polaris 10 GPU.
Now, stating that there are no major architectural changes in a GPU generation is no knock. The same can be said for Nvidia, which continues to leverage the same fundamental shader array architecture as it introduced in 2009’s Fermi. Consider the monumental task in recreating a grounds-up architecture for a 5+ billion transistor chip, and you quickly realize the benefit of multi-generational longevity.
Moreover, the fact that engineering teams at both vendors recognize how daunting a task that can be is directly reflected in their respective architectures. Both were intentionally designed to scale gracefully with process shrinks: simply add more shader units and you’ve got a new generation and a new chip. Of course, nothing’s simple when you’re talking multi-billion transistor chips, but you get the point — it’s a lot easier and a lot more effective way to gain performance than a major overhaul of the base architecture every 18 - 24 months.
That said, however, by no means are either vendor’s chip architectures standing still. Each uses the opportunity of a new generational spin to make impactful new tweaks. Often that means adjusting cache and buffer sizes, both to take advantage of denser processes to reduce idle “bubbles” and streamline throughput. But AMD made other significant architectural enhancements in Polaris as well, most notably the following:
Improved 3D pipeline execution efficiency, through the implementation of:
- Improved primitive discard (i.e. “aggressive culling”) acceleration, hardware scheduling, instruction pre-fetch, shader efficiency and memory compression. All features improve 3D pipeline efficiency and therefore throughput.
- Native support for fp16 and int16 datatypes (16-bit floating point and integer, respectively). Limited in its impact to 3D graphics, the fp16 introduction is primarily justified by applications in “deep learning” and AI, ancillary areas that AMD is looking to leverage with its GPUs.
- In terms of display and multimedia features, AMD’s primary touted features include support for HDMI 2.0a, DP 1.3, H.265 (main 10) decode up to 5K and encode up to 60 fps.
Testing with Viewperf 12
As is our norm for evaluating graphics cards, we ran the latest version of SPEC’s Viewperf. Viewperf 12 focuses workload on the graphics card, such that the rest of the system isn’t (or at least shouldn’t often be) the bottleneck. As a result, Viewperf will give a good idea of which card has the highest peak performance. However, it’s worth noting that the magnitude of any superior numbers does not indicate the level of superiority it will have in a real-world environment where the rest of the system, OS and application may impose other bottlenecks.
With this benchmarking exercise, we relied again on our standard testbench, an Apexx 4 workstation graciously loaned to us by Boxx. We’d reviewed the Apexx 4 back in the spring of 2015, and it proved to have excellent performance, particularly for single-thread execution thanks to its liquid-cooled Core i7-6590K CPU. A high-performance platform is highly desirable for running Viewperf in particular, as it helps to ensure that bottlenecks that emerge are as much as possible due to the graphics subsystem, rather than some other weak link in the system, for example a slow disk. The rest of the Apexx 4 configuration complements the blistering 4.125 GHz CPU with 32 GB of memory and a SATA-based solid-state drive (SSD). Best, it provides a standard platform to compare multiple cards in a true apples-to-apples manner.
Side-by-side, the raw results for the Radeon Pro WX trio show what you’d expect, with the WX 7100 setting the high bar, while the scores sensibly decline stepping down the WX 5100 and WX 4100. Likewise, looking at price-performance in scores/dollar, we also see a sensible pattern, in reverse, with the WX 4100 leading the way, trailed respectively by the higher priced sibling WX 5100 and WX 7100.
Stacking up the Radeon Pro trio with closest Nvidia Quadro rivals
With just two viable suppliers of workstation graphics hardware, it’s natural to wonder how the rivals’ product lines compare. As it happens, we not only have Viewperf 12 results for all three of Nvidia’s Quadro products for which this new Radeon Pro trio would best match up, but all tests were run on the exact same Boxx system. The mid-range WX 7100 is most comparable to Nvidia’s Quadro M4000, which began shipping in the fall of 2015. The WX 5100 lines up closely to the Quadro M2000, launched in the spring of 2016. And finally, the half-height WX 4100 matches up best with Nvidia’s SFF-focused, half-height Quadro K1200, launched in June of 2015.
How’d the scores compare? On average across viewsets, the Radeon Pro WX 7100 and WX 4100 outperformed its rivals by 19% and 43%, respectively, while the Quadro M2000 nudged out the WX 5100 by 2%. Results for each Radeon Pro SKU in the following charts are normalized to that card’s closest Quadro competitor, comparing raw scores, scores / dollar and scores / W.
AMD positioning looks best when viewed in terms of price-performance (scores/dollar), and if you’ve followed this market, that shouldn’t surprise. As the market leader, Nvidia is able to command a higher price at comparable tiers. As such, AMD’s strongest sell has always been price-performance versus all-out performance. In this market, street prices tend to trail MSRPs dramatically, and more so for AMD, so we use estimated street prices (aggregating across many on-line retailers).
AMD’s positioning looks a bit worse when viewed in terms of watts, as AMD’s TDPs for comparable cards are all a bit higher than Nvidia’s comparable SKUs, despite similar processes. Both chips are built in a 14 nm FinFET process, which helps reduce power by allowing transistors to run with comparable performance at lower voltages or with higher performance at comparable voltages.
What do we think?
With this round of Radeon Pro cards, targeting the mid-range to entry segments of the market, AMD accomplished what it needed to. Leveraging the Polaris generation of GCN, all three GPUs deliver the gains in performance over its predecessors to be compelling upgrades. Furthermore, the three fit nicely into the overall product line with sensible gains in performance stepping up the line and comparable gains in price-performance as one steps down.
In terms of its position today versus its rival Nvidia, the WX 4100 holds a substantial Viewperf 12 edge over the Quadro K1200 and the WX 7100 modestly outperforms the Quadro M4000, while the WX 5100 roughly matches the Quadro M2000. The position for all should look a bit better for users looking primarily at performance/dollar. In that respect, it’s mission accomplished for AMD, giving buyers a competitive alternative to Nvidia Quadro — end-users and OEMs alike love to see AMD stay competitive.
However, these Viewperf metrics need to be taken in the appropriate context. In all cases, we’re contrasting just-launched Radeon Pro SKUs against significantly older Quadro SKUs. Nvidia released the mid-range Quadro M2000 a little over six months ago, while the M4000 and K1200 are about 12 and 18 months old, respectively. Furthermore, Nvidia is in the midst of rolling out SKUs that leverage its follow-on GPU generation, Pascal. Just launched were the ultra-high end Quadro P6000 and P5000 (which incidentally, we expect to review and test shortly), and given the usual progression, we’d expect to see successors to the M4000, M2000 and K1200 in the near future.
Bottom line, and as it so often is, the market for workstation GPUs is in a phase of leapfrog. That’s no knock on AMD, it just means the other frog will get its turn shortly, and positions will most likely shift.—A.H