A.I Compiler Technologies

Interesting information about A.I Compilers

Performant Flash Attention (v2) for Everyone

Flash Attention was introduced in 2022 as a fast and memory-efficient exact attention algorithm that used tiling and algebraic aggregation to reduce the number of memory reads/writes between GPU high bandwidth memory (HBM) and GPU on-chip SRAM. This made it faster and more memory-efficient than traditional attention algorithms, especially for […]

Read More

Ultra-realistic Generative AI based rendering in Blender with SHARK and AMD Radeon™ PRO W7900 GPUs

At NAB 2023, we are demonstrating the SHARK Stable Diffusion integration with Blender on AMD Radeon™ PRO W7900 featuring 48GB Memory. Come check it out in the AMD Innovation Zone. Generative AI is transforming entire industries. Today we are demonstrating an integration of the SHARK Stable Diffusion REST API with […]

Read More

Unleashing PyTorch 2.0 TorchDynamo powered by SHARK

Pytorch 2.0 brings exciting new technologies such as Torch Dynamo focused on Machine Learning Model capture in Python. The nod.ai team along with other torch-mlir community members have been adding support for Torch Dynamo in Torch-MLIR over the past few months. We are now proud to have started shipping Torch-MLIR […]

Read More

SHARK: The fastest PyTorch runtime – 3x over Torchscript, 1.6x over TF/XLA, 76% faster than ONNXRuntime

SHARK Introducing SHARK – A high performance PyTorch Runtime that is 3X faster than the PyTorch/Torchscript , 1.6X faster than Tensorflow+XLA and 76% faster than ONNXRuntime on the Nvidia A100.  All of this is available to deploy seamlessly in minutes. Whether you are using Docker, Kubernetes or plain old `pip […]

Read More