AWS Inferentia Overview

AWS Inferentia chips are a new class of deep learning inferencing hardware built to accelerate inference while reducing cost. It enables complex neural net models, created and trained in popular frameworks such as Tensorflow, PyTorch, and MXNet, to be executed using AWS Inferentia based Amazon EC2 Inf1 instances.

Amazon EC2 Inf1 instances based on AWS Inferentia chips is the first generation instance type which feature up to 16 AWS inferentia chips, 2nd generation Intel Xenon Scalable processors and up to 100 Gbps networking.

AWS Neuron is a software development kit (SDK) for running machine learning inference using AWS Inferentia chips. It consists of a compiler, run-time, and profiling tools that enable developers to run high-performance and low latency inference using AWS Inferentia-based Inf1 instances. AWS Neuron enables flexibility for developers to train their machine learning models on any popular framework such as TensorFlow, PyTorch, and MXNet, and run it optimally on Amazon EC2 Inf1 instances. The AWS Neuron SDK comes pre-installed in AWS Deep Learning AMIs, and will also be available pre-installed in AWS Deep Learning Containers soon. Neuron allows you to keep training in 32-bit floating point for best accuracy and auto-convert the 32-bit trained model to run at speed of 16-bit using bfloat16 model.

Learn about how you can lower ML inferencing operational costs in the cloud using AWS Inferentia


We have divided up the workshop into 4 main examples highlighting how Inferentia can integrate with a larger training and deployment pipeline using popular perception and language models. They are progressively more advanced and will build upon concepts from the previous workshop. If you are new to AWS, start with Resnet50 on EC2.

  • Resnet50 on EC2 - We will start with a simple example taking the resnet50 model and compiling it with neuron and running inferencing on an sample image.

  • Sagemaker on Inf1 - Inf1 instances are supported via Sagemaker, in this workshop we will go through an example of how to run training and inferencing using the Sagemake SDK and Jupyter Notebooks.

  • OpenPose Inf1 - In this example we will show how are ran an OpenPose model on Inf1

  • BERT Inferencing on Inf1 - Here you can explore how are taking a large model like BERT and utilizing multiple Neuron cores to utilize Inf1