Nvidia ditches Intel, cozies up to AMD with its new DGX A100

May 15, 2020

This morning, everybody found out what CEO Jensen Huang was cooking—an Ampere-powered successor to the Volta-powered DGX-2 deep learning system.

Yesterday, we described mysterious hardware in Huang’s kitchen as likely “packing a few Xeon CPUs” in addition to the new successor to the Tesla v100 GPU. Egg’s on our face for that one—the new system packs a pair of AMD Epyc 7742 64-core, 128-thread CPUs, along with 1TiB of RAM, a pair of 1.9TiB NVMe SSDs in RAID1 for a boot drive, and up to four 3.8TiB PCIe4.0 NVMe drives in RAID 0 as secondary storage.

Goodbye Intel, hello AMD

Technically, it shouldn’t come as too much of a surprise that Nvidia would tap AMD for the CPUs in its flagship machine-learning nodes—Epyc Rome has been kicking Intel’s Xeon server CPU line up and down the block for quite a while now. Staying on the technical side of things, Epyc 7742’s support for PCIe 4.0 may have been even more important than its high CPU speed and massive core/thread count.

GPU-based machine-learning frequently bottlenecks on storage, not CPU. The M.2 and U.2 interfaces used by the DGX A100 each use 4 PCIe lanes, which means the shift from PCI Express 3.0 to PCI Express 4.0 means doubling the available storage bandwidth from 128GB/sec to 256GB/sec per individual SSD.

There may have been a little bit of politics lurking behind the decision to change CPU vendors, as well. AMD might be Nvidia’s biggest competitor in the relatively low-margin consumer-graphics market, but Intel is muscling in on the data center side of the market. For now, Intel’s offerings in discrete GPUs are mostly vapor—but we know Chipzilla’s got much bigger and grander plans as it shifts its focus from the moribund consumer-CPU market to all things data center.

The Intel DG1 itself—which is the only real hardware we’ve seen yet—has leaked benchmarks that have it competing with the integrated Vega GPU from a Ryzen 7 4800U. But Nvidia might be more concerned about the Xe HP 4-tile GPU, whose 2048 EUs (execution units) might offer up to 36TFLOPS—which would at least be in the same ballpark as the Nvidia A100 GPU powering the DGX unveiled today.

DGX, HGX, SuperPOD, and Jetson

The DGX A100 was the star of today’s announcements—it’s a self-contained system featuring eight A100 GPUs, with 40TiB GPU memory apiece. The US Department of Energy’s Argonne National Lab is already using one DGX A100 for COVID-19 research. The system’s nine 200Gbps Mellanox interconnects make it possible to cluster multiple DGX A100s—but those whose budget won’t support lots of $200,000 GPU nodes can make do by partitioning the A100 GPUs into up to 56 instances apiece.

For those who do have the budget to buy and cluster oodles of DGX A100 nodes, they’re also available in an HGX—Hyperscale Data Center Accelerator—format. Nvidia says that a “typical cloud cluster” comprised of its earlier DGX-1 nodes along with 600 separate CPUs for inference training could be replaced by five DGX A100 units, capable of handling both workloads. This would condense the hardware down from 25 racks to one, the power budget from 630kW to 28kW, and the cost from $11 million to $1 million.

If the HGX still doesn’t sound big enough, Nvidia has also released reference architecture for its SuperPOD—no relation to Plume. Nvidia’s A100 SuperPOD connects 140 DGX A100 nodes and 4PB of flash storage over 170 Infiniband switches, and it offers 700 petaflops of AI performance. Nvidia has added four of the SuperPODs to its own SaturnV supercomputer, which—according to Nvidia, at least—makes SaturnV the fastest AI supercomputer in the world.

Finally, if the data center’s not your thing, you can have an A100 in your edge computing instead, with Jetson EGX A100. For those not familiar, Nvidia’s Jetson single-board platform can be thought of as a Raspberry Pi on steroids—they’re deployable in IoT scenarios but bring significant processing power to a small form factor that can be ruggedized and embedded in edge devices such as robotics, health care, and drones.

Previous articleCharts suggest more pain is in store before market reaches a ‘buyable bottom,’ Jim Cramer says

Next articleSony shows off first combination image sensor and AI chip

US watchdog issues final rule to supervise Big Tech payments, digital…

Amazon doubles down on AI startup Anthropic with another $4 bln

McDonald’s is giving its menu the biggest shakeup in years

BNPL, Mobile Banking Embraced by Low-Income, Paycheck-to-Paycheck Consumers

Lowell and MyBnk Explore the Role Gaming Can Play in Levelling…

World’s 1st silicon anode EV battery will let you drive up…

OpenAI wants Samsung to use its ChatGPT features for Galaxy AI

You can now try Microsoft’s Recall AI feature on a Copilot…

3D-printing advance mitigates three defects simultaneously for failure-free metal parts

Lithium composite material enhances performance and safety of next-gen lithium rechargeable…

Bitcoin Going to $140K Says Trio of AIs Managing $30M Investment…

If You Bought $1 Worth Of Bitcoin At Launch, Here’s How…

Apple still dominates US smartphone market

J.P. Morgan sees Brent oil price averaging $73 a barrel in…

SEC Chair Gary Gensler will step down Jan. 20, make way…

Weekly Market Review – November 23, 2024

Weekly Market Review – November 23, 2024

‘I have no money’: Thousands of Americans see their savings vanish…

Having a baby? Here’s where to put your money

What’s The Average ‘Upper Class’ Retirement Nest Egg? Here’s A Look…

Nvidia ditches Intel, cozies up to AMD with its new DGX A100

Goodbye Intel, hello AMD

DGX, HGX, SuperPOD, and Jetson

Must Read

J.P. Morgan sees Brent oil price averaging $73 a barrel in...

Apple still dominates US smartphone market

If You Bought $1 Worth Of Bitcoin At Launch, Here’s How...

Bitcoin Going to $140K Says Trio of AIs Managing $30M Investment...

3D-printing advance mitigates three defects simultaneously for failure-free metal parts

Most Viewed

3D-printing advance mitigates three defects simultaneously for failure-free metal parts

Machine learning aids rapid advancement of a high-resolution 3D printing technology

Nvidia stock rises on AI spending, chip deal ahead of earnings

Trending Now

Weekly Market Review – November 23, 2024

Weekly Market Review – November 23, 2024

‘I have no money’: Thousands of Americans see their savings vanish in Synapse fintech...

Nvidia ditches Intel, cozies up to AMD with its new DGX A100

Goodbye Intel, hello AMD

DGX, HGX, SuperPOD, and Jetson

RELATED ARTICLES

Must Read

Most Viewed

Trending Now