Compiling and Running on Inferentia

Compile the pre-trained Keras ResNet-50 model exported in SavedModel format for the Inferentia chip and perform image classification on the Neuron core. The output format of the compiled model is also Saved Model.

SavedModel is a format that can be exchanged for a TensorFlow model in which the model weights and network structure are stored together.

Step 1. Create a python script for compiling a model

Create a python script named compile_resnet50.py with the following content:

import shutil
import tensorflow.neuron as tfn

model_dir = 'resnet50'

# Prepare export directory (old one removed)
compiled_model_dir = 'resnet50_neuron'
shutil.rmtree(compiled_model_dir, ignore_errors=True)

# Compile using Neuron
tfn.saved_model.compile(model_dir, compiled_model_dir)

Step 2. Run the compilation script

Run the compilation script, which will take ~2 minutes on inf1.2xlarge.

time python compile_resnet50.py  
....
Compiler status PASS
INFO:tensorflow:Number of operations in TensorFlow session: 4638
INFO:tensorflow:Number of operations after tf.neuron optimizations: 876
INFO:tensorflow:Number of operations placed on Neuron runtime: 874
INFO:tensorflow:Successfully converted resnet50 to resnet50_neuron

real    1m9.173s
user    0m56.236s
sys     0m2.773s

Neuron compiler performs Ahead-Of-Time compilation. Compared to a just-in-time (JIT) compiler or first-time inference compilation system, you can save time when deploying on multiple instances. Neuron compiler automatically fuses operators for scheduling and memory management.

Step 3. Create a python script for inference

Create a python script named infer_resnet50_neuron.py with the following content. The script load the model which was compiled in Step 2.

import os
import time
import shutil
import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.resnet50 import ResNet50, preprocess_input, decode_predictions

# Load model
compiled_model_dir = 'resnet50_neuron'
predictor_inferentia = tf.contrib.predictor.from_saved_model(compiled_model_dir)

# Create input from image
img_sgl = image.load_img('kitten_small.jpg', target_size=(224, 224))
img_arr = image.img_to_array(img_sgl)
img_arr2 = np.expand_dims(img_arr, axis=0)
img_arr3 = preprocess_input(img_arr2)

# Run inference, Display results
model_feed_dict={'input': img_arr3}
infa_rslts = predictor_inferentia(model_feed_dict)
print(decode_predictions(infa_rslts["output"], top=5)[0])

Step 4. Run inference script

Run the inference script infer_resnet50_neuron.py for running inference on Neuron cores.

python infer_resnet50_neuron.py

You will get almost same result as if you ran it on the CPU.

[('n02123045', 'tabby', 0.68817204), ('n02127052', 'lynx', 0.12701613), ('n02123159', 'tiger_cat', 0.08736559), ('n02124075', 'Egyptian_cat', 0.063844085), ('n02128757', 'snow_leopard', 0.009240591)]