
The Nod.ai team has been hard at work and is ready with a Summer release.
Release Notes:
Model Support:
- Hundreds of new models and model variants added to the SHARK tank.
- Continuous Integration tests each supported model variant on each supported Hardware backend (Intel / AMD CPUs, Nvidia A100, AMD MI100, Apple Silicon CPU and Apple Silicon GPUs)
- Upcoming interesting models: OPT/GPT3, DLRM and V-Diffusion.
- torch-mlir enhancements for PyTorch source builds, custom op support and more torchbench models.
- nod.ai team is the largest contributor to torch-mlir as shown below:
Deployment:
- Added support for Nvidia Triton Inference server. SHARK models can now be deployed with Triton Inference Server.
- Reduced dependent packages installed to make installation easier / faster without importer tools.
- Downloadable .mlir files from the SHARK tank in-lieu of local importing
Performance:
- Low level mma codegen for Tensorcores on Nvidia A100+ for performance
- Add Swizzle and Split-K tuning support
- Outperforms ONNXRuntime by 2x now for MiniLM Bert (1.36ms vs 2.54ms)
- Matmul performance on par or better than CUTLASS / CuBLAS for BERT sizes


Developer Tools:
- Added support for SHARK Eagermode
- Added support for SHARK + TorchDynamo
- Add support to run `pytest` with –benchmark to benchmark against native frameworks
- pip installable SHARK auto-tuner.
- Added SHARK Discord server for real-time access to developers
Training and Finetuning:
- Added Python examples for Training
- Jupyter notebook example for fine tuning BERT
SHARK Hardware Support:
- Apple M1, M1 Max/Ultra and M2 Support now runs on CIs.
- AMD MI100 (MFMA is WIP).
- NVIDIA A100 CUDA and VULKAN on CI
- Intel LevelZero (XMX/DPAS is WIP).
Download SHARK from https://github.com/nod-ai/SHARK