At CES 2023, we are showing our Stable Diffusion Demonstration on Radeon™ RX 7900 XTX, in the AMD booth. Come check it out at The Venetian – Titian 2304

Generative AI has taken the world by storm but until now it took a while to generate an image from a text prompt with the typical 50 Steps on a GPU. The fastest generally available solutions on Windows start at 5 seconds or higher unless you want to start copying DLLs by hand to upgrade the torch libraries. There has also been a wide variety of accuracy-degrading performance optimizations like Xformers and Flash Attention, which are great tools if you are open to trading accuracy for performance, however we wanted to unlock maximum performance without any of the accuracy degrading optimizations.

The team is pleased to announce Stable Diffusion image generation accelerated on the AMD RDNA™ 3 architecture running on this beta driver from AMD. has been optimizing this state-of-the-art model to generate Stable Diffusion images, using 50 steps with FP16 precision and negligible accuracy degradation, in a matter of seconds.

We’re very excited to see porting the Stable Diffusion model to run performantly on AMD’s RDNA3 architecture, in collaboration with the MLIR, IREE and OpenXLA community.

Dan Wood, VP AMD

We believe that Generative AI should be accessible to everyone irrespective of their technical background. So we made our Stable Diffusion WebGUI  easily accessible and usable. Today you can download a single file and get started on your Generative AI endeavor. The community has reported that it is able to run on older generation hardware dating back five years.

Here are images generated by the SHARK community on AMD RDNA™ architecture-based devices in the #ai-art Discord channel. Please check discord for artist credits.

Give it a try at, share and show off what you can create with Generative AI. We are not done with performance, ease of use or feature requests – so stay tuned for more over the upcoming weeks.

SHARK is an open source cross platform (Windows, macOS and Linux) Machine Learning Distribution packaged with torch-mlir (for seamless PyTorch integration), LLVM/MLIR for re-targetable compiler technologies along with IREE (for efficient codegen, compilation and runtime) and’s tuning. IREE is part of the OpenXLA Project, an ecosystem of ML compiler and infrastructure technologies being co-developed by AI/ML industry leaders including AMD, Google, and many more. OpenXLA aims to let ML developers build models in their preferred framework (TensorFlow, PyTorch, JAX) and easily execute them with high performance across a wide range of hardware backends (GPU, CPU, and ML accelerators).

It was fantastic to see the Nod/AMD collaboration produce the great results it has. Beyond the numbers, I am really proud that we were able to create an engaged community that is empowered to make this kind of project happen. That was a key reason I started IREE and was ultimately behind the decision to become part of the OpenXLA project. As part of OpenXLA, we’ll work closely with our community to carry this momentum forward.

Stella Laurenzo, IREE co-founder, OpenXLA community leader, Google ML Compilers is hiring: Join us on Discord

Comments are closed.