//php echo do_shortcode(‘[responsivevoice_button voice=”US English Male” buttontext=”Listen to Post”]’) ?>
AMD executives presented their vision for AI, including new data center CPU and GPU products, at an event in San Francisco on Tuesday. They displayed practically everything in the company’s portfolio capable of running AI and wheeled out customer after hyperscale customer to drive home the point that they are succeeding in supercomputers and at hyper scale. But what everyone really wants to know is: Can AMD’s new GPU rival competitor Nvidia’s flagship H100 GPU? And while it looks to beat the H100 in raw performance, will that be enough?
“AI is the defining technology shaping the next generation of computing,” AMD CEO Lisa Su said at the event. “Frankly, it’s AMD’s largest and most strategic growth opportunity.”
With the H100 reportedly in short supply at the moment, due to heightened demand for training and inference of large language models (LLMs) like ChatGPT, there has never been a better moment to announce a credible H100 competitor.
AMD’s line of data center GPUs includes the previously-announced MI300A, which will be in the 2-exaFLOPS El Capitan supercomputer currently being commissioned at the Lawrence Livermore National Laboratory. MI300A is a 13-chiplet design with a mix of CPU and GPU chiplets, a popular combination for HPC.
During AMD’s event, Su released details of the new MI300X—a GPU-only version of the MI300A that has swapped out MI300A’s three CPU chiplets for two GPU chiplets. More HBM memory and memory bandwidth has also been added, specifically to optimize for LLMs and other AI workloads. The GPU chiplets are on AMD’s third gen CDNA3 architecture, which has a new compute engine with AI-friendly data formats compared to the MI250 and earlier products.
Overall, the new flagship GPU has a dozen 5- and 6-nm chiplets, for 153 billion transistors total. It features 192 GB HBM3 memory with 5.2 TB/s memory bandwidth. For comparison, Nvidia’s H100 comes in a version with 80 GB HBM2e, with a total of 3.3 TB/s. That puts the MI300X at 2.4× the HBM capacity and 1.6× the HBM bandwidth.
“With all of that extra capacity, we have an advantage for larger models because you can run larger models directly in memory,” Su said. “For the largest models, that reduces the number of GPUs you need, speeding up performance— especially for inference—and reducing [total cost of ownership, TCO].”
In other words, forget “the more you buy, the more you save,” (per Nvidia CEO Jensen Huang’s 2018 speech), AMD is saying you can get away with fewer GPUs, if you want to. The overall effect is that cloud service providers can run more inference jobs per GPU, lowering the cost of LLMs and making them more accessible to the ecosystem. It also reduces the development time needed for deployment, Su said.
AMD also showed off the AMD Instinct Platform, an 8x MI300X system (with 2x AMD Genoa host CPUs) analogous to Nvidia’s HGX-H100, in OCP-compatible format. This system is intended to drop into existing infrastructure quickly and easily. It will be sampling to key customers in Q3.
The jewel in Nvidia’s crown is its mature AI and HPC software stack, CUDA. This is often cited as one of the key reasons AI chip startups have struggled to take market share from the leader.
“No question, the software is so important for enabling our hardware to be deployed broadly,” Su said, admitting that software has been “a journey.”
AMD calls its AI software stack ROCm (“Rock-’em”); Su said a “tremendous” amount of progress has been made in the last year.
In contrast to Nvidia’s CUDA, “a significant portion” of ROCm is open, said AMD President Victor Peng. The open portion includes drivers, language, runtime, tools like AMD’s debugger and profiler, and libraries. ROCm also supports open frameworks, models and tools, with optimized kernels for HPC and AI. AMD has been working with PyTorch to ensure day-zero support for PyTorch 2.0, and to test the PyTorch-ROCm stack works as promised.
AMD also announced a new collaboration with HuggingFace. HuggingFace will optimize thousands of its models for AMD Instinct accelerators, as well as other parts in the AMD edge portfolio.
HuggingFace CEO Clem Delangue took the stage to talk about democratization of AI and LLMs.
“It’s really important that hardware doesn’t become the bottleneck or gatekeeper for AI when it develops,” he said. “What we are trying to do is to extend the range of options to AI builders for training and inference. We are excited about the ability of AMD in particular to power LLMs in data centers, thanks to the memory capacity and bandwidth advantage [of the MI300X].”
Raw performance vs. timing
While the MI300X looks to beat the H100 on raw performance, Nvidia has a few tricks up its sleeve, including the transformer engine, a software feature that enables mixed-precision training of LLMs for better throughput. On the software side, despite AMD’s progress on ROCm, and efforts like OpenAI’s Triton, Nvidia’s CUDA still dominates. Software maturity has been a huge challenge for startups in this field.
And then there is timing. Nvidia H100s are available now (if you can get hold of one), but MI300X is not set to ramp until Q4. Nvidia is expected to unveil a new generation of its GPU architecture in 2024, which would potentially put the MI300X on the back foot once again. The cadence means AMD will perpetually be following, not leading.
Future AMD GPUs will likely continue to leverage its chiplet expertise, while though Nvidia has multi-chip products like the Grace Hopper superchip, it has not yet moved to chiplets in the same way.
Will this earlier move to chiplets work out to AMD’s advantage? It seems inevitable Nvidia will have to move to chiplets (following Intel and AMD) eventually, but how soon this will happen is still unclear.
Is the MI300X compelling enough to take at least some share of the data center AI market from Nvidia? It certainly looks that way, given AMD’s existing customer base in HPC and data center CPUs—a huge advantage over startups.
One thing is for sure: The size of this opportunity is more than big enough for two players.