Distributed ML Runtime

Posts relating to Distributed ML Training and Inference

SHARK: The fastest PyTorch runtime – 3x over Torchscript, 1.6x over TF/XLA, 76% faster than ONNXRuntime

SHARK Introducing SHARK – A high performance PyTorch Runtime that is 3X faster than the PyTorch/Torchscript , 1.6X faster than Tensorflow+XLA and 76% faster than ONNXRuntime on the Nvidia A100.  All of this is available to deploy seamlessly in minutes. Whether you are using Docker, Kubernetes or plain old `pip […]

Read More