Posted: Alex Herrera 08.07.18
AMD has been making quite a bit of noise in the processor world in the last two years. The company’s accomplishments of late certainly justify its cranking up the volume, as it’s managed to roll out an impressive number of CPUs serving a wide span of price and performance points for both clients and servers. I’ve reviewed new workstation models from Boxx built both on high-end single socket Threadripper and top-end dual socket Epyc with an unheard of 64 cores (32 × 2) of compute power. Both deliver performance that, depending on workload, can challenge the best of what Intel can offer.
With all that renewed enthusiasm buoyed by its strongest market position in over a decade, you might think AMD has driven up its penetration of the workstation market. But that’s not happened, at least not yet. Intel virtually owns the workstation platform today, with all major OEMs exclusively selling Core and Xeon based workstations, and just a few AMD models available from vendors like Boxx and some white-box vendors. Still, AMD is most certainly threatening, and both of its high-end workstation-relevant platforms, Threadripper and Epyc, have caught Intel’s attention in particular, causing the incumbent to adjust its products and strategy.
To fend off any possible incursion from 32C Epyc, Intel has positioned its highest core count Xeon Scalable with 28 cores. And as luck would have it, Boxx also built a new workstation on the dual Xeon Scalable platform, the Apexx D4, offering two sockets and 56 total cores. With Boxx sending me the D4, I could not only review the machine but evaluate the capabilities of the dual 28C Xeon Scalable Platinum 8180 and assess how well it matches up against dual 32C Epyc as well. Thanks Boxx.
Boxx executes again on its tried-and-true formula
Boxx has put together a lengthy and successful track record in the workstation industry by specifically not following the market leaders with me-too products. Rather, the company has developed a strong identity on the back of differentiated performance focused, and application focused machines. It intentionally doesn't compete on the same footing as high-volume suppliers HP, Dell or Lenovo, because it knows it can't compete in their playing field, one focused primarily on price and price-performance. On the flip side, by making very conscious design decisions in its workstation products, it also knows that HP, Dell, and Lenovo can't compete in its corners of the market either — ones that value top-end performance and will sacrifice reasonably higher prices to get it.
No doubt, the term “good value” isn’t one many would think to apply to the Apexx D4 I have configured. The D4 starts at under $8K, but with two 28C Xeon Scalable Platinum 8180 CPUs, the top-end Quadro P6000 with 24 GB GDDR5, 128 GB of DDR4 memory and 512 GB of M.2 NVMe SSD, you know it’s not priced anywhere the mainstream of the workstation market. Rather, its configuration price of $43K puts it at the far reaches of the workstation price spectrum. Looking at historical sales metrics by price point, a machine configured like this will only be a real consideration for less than 0.1% of the professional-client computing community. This beast of a machine is specifically targeted for the most performance-challenged applications for which any possible speed-up will translate directly to business rewards (e,g, revenue, profitability, time-to-market, quality). If the rewards are significant enough, then yes, this machine can certainly be said to have “good value.”
|The consistent volume vs. price curve for deskside workstations puts a dual Xeon Scalable Platinum 8180 machine of this caliber in the upper reaches of the market|
As the following chart show, the powerful Boxx machine, and others like it, occupy a slim portion of the overall market, on a unit basis.
|Market distribution and Boxx’s high-end machine’s place in it|
On an average selling price (ASP) basis, the relationship is much more favorable. This is not a case of if you build it they will come, this is a case of pent-up demand. As Jon is fond of saying, in computer graphics, too much is not enough.
|The burly recognizable Boxx tower exterior is immediately recognizable|
The Apexx D4’s exterior is classic Boxx. The design touches stand out as always, starting with a sturdy industrial enclosure without an ounce of molded plastic — nothing but steel and alloy here. Its front grill houses an easily releasable and cleanable dust filter, a feature that honestly should be on every workstation, given that accumulating dust is among the top contributor to eventual hardware failure, and one of the selling points and differentiation of a workstation is its long-term reliability.
Once inside, the eye immediately catches one of Boxx’s most common differentiators, liquid-cooling, a design feature that’s become Boxx’s calling card among workstation vendors. See liquid-cooling, and you might assume the dual Xeon Scalable Platinum 8180 CPUs are overclocked. That’s the typical goal for gaming rigs looking to eke out every possible drop of performance. But while these parts are running at 2.5 GHz, the fastest Intel specs for the 28C Xeon Scalable Platinum 8180 part, they are specifically not overclocked. Again, one of the hallmarks of a workstation is reliability — overclocking will wear out a processor. Keeping it running cool extends its life.
In the case of workstations rather than gaming rigs, OEMs may implement liquid cooling not to overclock but to serve other purposes. For example, the more efficient cooling should increase reliability at the same clock rate. And as important to many, liquid cooling can run quieter than purely air-cooled machines that have to move a lot more cfm of air through tight chassis quarters. I’ll assess precisely how well Boxx’s liquid cooling accomplishes that second goal when I review acoustic output under load ahead.
Matching the no-compromise choice in CPUs, our D4 came outfitted with a single top-of-the-line, cut-no-corners Nvidia Quadro P6000 GPU. However, the D4’s ample PCIe slots do allow for up to five GPUs to be installed, with options from AMD Radeon Pro as well (though not every combination is allowed, as power must be held within the capabilities of the D4’s 1500 W 80 Plus Gold (90% efficiency) modular power supply.
|The Apexx D4’s spacious slot space, room and power for plenty of GPUs|
Top view of the motherboard without, and with the Quadro AIB.
|The Apexx D4’s four drive bays abutting the dual-width Nvidia Quadro P6000|
Having reviewed many Boxx workstations over the years, including the recent Apexx S3 and Apexx 6301, I won’t belabor the details of the Apexx D4’s build breadth and quality. Suffice to say, the Apexx D4 is up to par with the rest of its Apexx siblings, and that’s high praise. No, given the 56C Xeon Scalable platform around which this machine was built, my focus will be to discern how capable that CPU platform is, across a breadth of workstation-typical workloads, and compared to a range of other possible—and just as legitimate—options in platforms.
Workstation performance as a function of cores: where the dual 28C Xeon Scalable Platinum 8180 fits
Fortunately, I’ve recently performed the same exercise on a wide range of workstation CPU platforms, from several cores at maximum frequency to maximum cores and modest frequency. Two were from Boxx and one from Velocity Micro: the first a Boxx Apexx S3 with the 6C “Coffee Lake” CPU, but whose cores were overclocked to nose-bleed levels, the second, a Boxx 4 6301 with a 16C Threadripper clocked at let’s-say-a moderate level of 3.4 GHz, and the third was the Velocity Micro ProMagix HD360A with dual 64C (32x2) AMD Epyc CPUs running at a comparably modest 2.2 GHz.
Into that mix now we can add this 2S 56 (28x2) Xeon Scalable Platinum 8180 Apexx D4, with a few cores less than the 64C Epyc platform but running at a modestly higher 2.5 GHz. With this range of CPU platforms, we’ve got a unique opportunity to evaluate the relative merits of very different workstation CPUs.
|Configuration specifications for our Apexx D4 and a few interesting contrasts in workstation CPU platforms|
The following chart shows the relative performance of those machines.
|The widely varying system specifications for our four test workstations (normalized to Boxx Apexx S3 with 6C Intel “Coffee Lake” Core i7 CPU)|
To gauge performance levels of this beast of a workstation from Velocity Micro, we employed the latest 2.1 version of SPEC's workstation-focused benchmark, SPECwpc. While no benchmark is perfect, SPECwpc does the best job I’ve seen of stressing all workstation components in a whole system environment that users may actually experience. It’s both broad and deep, and aggregates sub-tests into workload groups representative of the most common workstation verticals: Media and Entertainment, Product Development, Life Sciences, Financial Services, Energy and General Operations. It even borrows the same viewsets that its graphics-focused sister test, Viewperf, uses to measure 3D graphics performance.
Going into such an exercise, I’d expect to see a certain eventual outcome, give or take. Given the relative strengths and tradeoffs for the three workstation’s CPUs, I’d assume the 6C Coffee Lake machine would perform best on single-to-few threaded workloads, and the 56C Xeon Scalable and 64C Epyc machines would perform best on heavily threaded workloads. I also would assume the 16C Threadripper would look best in two respects: one, delivering an appealing balance of good performance on both minimally threaded workloads and heavily threaded ones, and win out on price-performance for heavily threaded workloads. Besides confirming (or negating) those assumptions, what would be most interesting would be to see whether the 64C lower-clocked Epyc gain the edge on the heavily threaded tests? Or would it be the 56C higher-clocked Xeon Scalable Platinum?
Now to emphasize, that spread in the best traits of each CPU platform doesn’t mean one is superior to another, but rather that the three types (let’s group the massively-core’d 56C Xeon and 64C Epyc as the same basic type) are equipped to operate better on different workloads. It’s like comparing a diesel engine that delivers huge torque at low RPM with a sports car engine that drives RPM high in order to achieve its horsepower: two different tools optimized for different applications.
It’s also worth emphasizing here that the faster-core versus more-cores decision represents a true engineering tradeoff, all else equal. That means the more you have of one, the less you’ll tend to have of the other. Ultimately, chip thermal, power, and electrical constraints will limit designers to how much they can push on one or the other design points. Populate a few cores and it’s far easier to drive the frequency up, but start piling on cores and (again, all else equal) the frequency will need to come down. It’s impossible, or at least very difficult and costly, to break that tradeoff and offer lots more cores at the same frequency.
|A true tradeoff: Core clock vs. core count|
How did the results stack up to expectations? First up, let’s take a look at the single-thread SPECwpc test scores for the three systems, viewed when normalized to the 6C Coffee Lake Apexx S3 scores (i.e. the Apexx S3 is a “1” always). Averaged across workloads, the Apexx S3 reigns supreme, albeit to varying degrees. However, it is nudged out by the slower-clocked Xeon Scalable on two tests. Since it can’t be superior in compute throughput (for single-thread), I’d attribute it to memory sensitivity given the Xeon Scalable Platinum’s superior on-chip cache and memory interface. Regardless, the 4.8 GHz 6C Coffee Lake crushes the competition on price-performance for the minimally-threaded tests, precisely as you’d think, as illustrated in the following chart.
|Relative performance for minimally-threaded, compute-focused (i.e. no Viewperf viewsets) SPECwpc tests — normalized to Apexx S3 / 4.8 GHz 6C Coffee Lake|
We also did a comparison of the price-performance of the three machines, which is shown in the next chart.
|Relative price-performance for minimally-threaded, compute-focused (i.e. no Viewperf viewsets) SPECwpc tests per dollar — normalized to Apexx S3 / 4.8 GHz 6C Coffee Lake|
Next up are the heavily-threaded SPECwpc tests, which take advantage of virtually all “logical” processing cores available in the underlying hardware. Logical cores include both the physical cores, as well as the same number of virtual cores enabled by technology like Intel HyperThreading that allow two threads to timeshare one physical core. Hyper-threading is an Intel technology and AMD processors don't support it, but Ryzen chips support “Simultaneous Multi-Threading”, which is similar to Intel's Hyper-Threading tech.
Accordingly, the 6C Coffee Lake has up to 12 threads allocated, the 16C Threadripper up to 32 threads, and the 2S 32C Epyc up to 128 threads. And again, just as expected, the 16C Threadripper system handily outpaces the 6C Coffee Lake, while the 2 × 32C Epyc and 2 × 28C Xeon Scalable Platinum crush both. Of course, the mileage from additional cores varies: the 16C Threadripper averages 1.8X the performance of the 6C Coffee Lake, the 2 × 32C Epyc averages around 3.3X the throughput, and the 2 × 28C Xeon Scalable Platinum manages around 4.3X.
So the Xeon Scalable Platinum platform, despite 6 fewer cores, outperforms the 64C Epyc by a fair margin on average, though not consistently across all workloads. However, if we flip the script to price-performance, Xeon Scalable Platinum drops to the bottom, thanks to a price point that far exceeds all. And it’s in price-performance for heavily threaded apps where we see the 16C Threadripper shine, as we expected, outpacing all others. Even the 6C overclocked Coffee
Lake bests the Xeon Scalable Platinum and Epyc on heavily threaded tests, simply because of its dramatically lower price point.
The spread in performance at each thread count can likely be attributed to differences in microarchitecture, for example, the aforementioned differences in cache size and memory access. And of course, even multi-threaded tests will scale with clock frequency as well. The outlier low-end results for the srmp test on Epyc presents a bit of a quandary, as it’s heavily threaded and logical cores appear heavily utilized during the test, meaning they don’t appear memory-starved.
|Relative performance for heavily-threaded SPECwpc tests (normalized to Apexx S3 / 4.8 GHz 6C Coffee Lake)|
|Relative price-performance (scores/$) for heavily-threaded SPECwpc tests (normalized to Apexx S3 / 4.8 GHz 6C Coffee Lake)|
Finally, one SPECwpc test in particular, Handbrake, is interesting in the sense that it presents a range of threading during its execution. The primary Handbrake code may be single-threaded, but portions of code it leverages — the codec(s) — can harness up to 32, with 16 concurrent threads predominant (and consuming capabilities of 16 cores). Justifiably, in the case of Handbrake, it’s 2 × 28C Xeon Scalable Platinum that triumphs. It can handle more threads than 6C Core i7 and Threadripper and execute them faster than Epyc. But again, looking at price/performance, Threadripper presents the best balance just nudging out 6C Coffee Lake.
Relative performance for moderately-threaded Handbrake test (normalized to Apexx S3 / 4.8 GHz 6C Coffee Lake)
|Relative price-performance (scores/$) for moderately-threaded Handbrake test (normalized to Apexx S3 / 4.8 GHz 6C Coffee Lake)|
Liquid cooling = quiet operation
Because all these workstations are high performers, they will all produce more noise than a lightly loaded entry-class workstation. That’s just the way it is, to some degree. You simply can’t break physical and thermodynamic laws. One can, however, do a better job of minimizing acoustic output through careful attention to chassis layout, airflow and possibly add in liquid cooling. The liquid cooling of both the Apexx S3 and even more, the Apexx D4, shows the value of a thoughtfully-engineered cooling approach. Both the liquid-cooled Boxx Apexx S3 (with overclocked 4.8 GHz 6C Coffee Lake Core i7) and the more conventional air-cooled Boxx with Threadripper produced more noise than a typical workstation. (For the record, liquid-cooling should really be redefined as “liquid-assisted” cooling, as airflow and fans are typically required to cool the liquid in the radiator at the enclosure’s vents, somehow, so fan noise and turbulence still impact, though to a lesser degree.)
Despite all the extra watts being consumed in the pursuit of overclocked performance, the Apexx S3 was not conspicuously noisy under nominal loading. Under heavy load (portions of SPECwpc), however, fan speeds kicked up significantly, driving perceptible noise up significantly, from a generally tolerable 47.2 turdb to around 50.7 db (approximating distance from under desk to ears). That’s a level I would find annoying, were it to last for lengthy periods. The Apexx 4 with Threadripper also got a little loud under heavy loading (same portions of SPECwpc), but on the order of 49.2 db peak, a little lower than the S3. That makes sense as the S3 has to push cfm probably on the order of the Apexx 4 but through a significantly smaller volume, leading to both higher fan speed and air turbulence.
However, assessing the air and fan-induced noise of the ProMagix HD360A hit another level entirely. My reasonably-well-calibrated and standardized testing showed 52 db idle and 55 db under load, levels I personally would only be able to tolerate for lengthy periods with noise-canceling headphones. I did not get a chance to assess a ProMagix HD360A with a liquid-cooling option, as it was not available at the time of review but should substantially drop noise levels.
By contrast, the noise of the Apexx D4, by virtue of Boxx’s choice to liquid cool (and well thought out cooling approach), was well within reasonable noise levels, even under heavy load. I measured the same 49.2 db peak I did for the Threadripper, a nudge below the Apexx S3 and far below that of the ProMagix HD360A. That level again is one that, while noisier than a commercial PC or entry workstation, is well within reason for the average user, and certainly acceptable considering the performance being delivered.
What do we think?
The 2 × 28C Xeon Scalable Platinum 8180 nudges out 2 × 32C Epyc on performance, but AMD presents a compelling price-performance argument
It’s hard to call AMD and Intel rivals in workstations. The former hasn’t been in a position to compete for a decade and the latter holds virtually 100% of the market. But that could change, assuming AMD makes the market a strategic target for its Zen-based CPU portfolio. OEMs would welcome its participation, giving them an alternative to the de facto monopoly Intel currently holds, and the dictates that come with that.
In the upper end of the mainstream single-socket workstation market, AMD’s Threadripper can challenge Xeon W, Intel’s premium CPU offering in the space. And in the top end of the market, the domain of the dual socket workstation, Intel’s Xeon Scalable now has another contender to keep an eye on, AMD’s Epyc.
A compelling machine for the few
Like the Velocity Micro machine reviewed prior, the Boxx Apexx D4 configured like ours is purpose-built to serve the top 0.1% of the workstation market, the highest demand corner that struggles with computing workloads—and to be specific highly-parallel workloads. These customers are highly motivated to reduce the execution times for those kinds of workloads. Those users know who they are, and they know there are very few available products capable enough to address their demands. Ideally, they’re aware of the compromises they’ll need to accept in order to get that type of performance.
Running the wide range of SPECwpc tests across a range of workstation CPU platforms that all would consider “high performance” in some respect, we see how different choices in balancing core count and core frequency (setting aside price for the moment) yield varying benefits in throughput, depending on workload. Of the four, the 6C, max-clocked Coffee Lake Core i7 stands out in minimally-threaded workloads and is the champion of price-performance for such workloads. Of the four, clearly, the 2.5 GHz 2 × 28 Xeon Scalable and 2.2 GHz 64C Epyc represent the closest head-to-head matchup, particularly as configured. The former outperforms the latter more often than not, and on average presents a considerable edge. However, factor in price and the more economical Epyc wins convincingly on price-performance. Finally, Threadripper represents precisely what we thought, a sensible compromise on multi-core performance, single-thread performance and cost.
Intel could have its hands Full with AMD in workstations … if AMD decides it’s committed to the market
AMD has the products it needs to compete at all tiers of the deskside workstation market, but the company also needs to display the fortitude to compete in a workstation market currently and completely owned by Intel. To date, none of the top-volume vendors HP, Dell or Lenovo have chosen to offer any of the workstation-appropriate CPU offerings in their respective workstation line-ups. The reason? Well, I’ll take the liberty to speak for them… they want to not only see AMD with a set of products that can compete across the range of their products today, they want the confidence the company can do so for the long-term, multiple generations. And more than that, they want to see that AMD is strategically committed as a partner in the market. They don’t want to see a repeat of the mid to late 00’s when several (including HP) dove in on AMD’s then-highly-competitive Opteron products to market workstations around AMD, only to see the CPU line’s competitiveness fade in the years following. And they don’t want to see AMD deciding a year down the road that workstations aren’t a priority and they’re really just focused on servers and high-performance corporate and consumer desktops.
Fortunately, AMD has one of the top workstation vendors not named HP, Dell or Lenovo signed up for Threadripper and Epyc. The machines Boxx has put together, as always, showcase the performance capable of the platform in a package that users will want—with the quality, reliability, serviceability, aesthetics, and ergonomics (e.g. noise) to satisfy. Boxx can help champion Zen-based CPUs for workstation duty, but ultimately it will come down to AMD’s fortitude. It has fired shots across Intel’s bow, catching the incumbent’s attention. The question now is how it will follow-through.
Also, as long as Fujitsu and Lenovo do not AMD, Boxx gains a competitive advantage in the marketplace with customers who like what AMD has to offer.