Articles by: Nod Team

SHARK: The fastest PyTorch runtime – 3x over Torchscript, 1.6x over TF/XLA, 43% faster than ONNXRuntime

SHARK Introducing SHARK – A high performance PyTorch Runtime that is 3X faster than the PyTorch/Torchscript , 1.6X faster than Tensorflow+XLA and 43% faster than ONNXRuntime on the Nvidia A100.  All of this is available to deploy seamlessly in minutes. Whether you are using Docker, Kubernetes or plain old `pip […]

Read More

torch-mlir: Bridging PyTorch and LLVM/MLIR ecosystems

We presented the torch-mlir project today at the LLVM/MLIR Open Design Meeting with more than 125 attendees from the industry. This is an important piece of the next generation A.I Software stack to bridge the ubiquity of the PyTorch eco-system to the LLVM/MLIR ecosystem and unlock building performant, reusable and […]

Read More

Analysis of the Huggingface Infinity Inference Engine

We love Huggingface and use it a lot. It really has made NLP models so much easier to use.  They recently released an enterprise product that is an inference solution with all the magic software for a hardware deployment in a docker container. https://huggingface.co/infinity Performance of ML Systems is close […]

Read More

Generating code to outperform native MatMul libraries (Accelerate, BLIS, MKL) and measuring it with MMperf

GEMM operations dominate the computation in modern Machine Learning Models. Silicon vendors typically provide hand optimized GEMM libraries such as Apple’s Accelerate Framework [1], AMD’s BLIS[2] and Intel’s MKL[3]. There are also open source implementations like OpenBLAS[4], BLIS[5], RUY[6]. We will demonstrate the performance of Nod.ai’s compiler generated code outperforming […]

Read More