For this demo, launch two EC2 instances :
For both of these instances choose the latest Ubuntu 18 Deep Learning AMI (DLAMI).
First connect to a c5.4xlarge instance and update tensorflow-neuron and neuron-cc
Update to the latest Neuron software by executing the following commands :
source activate aws_neuron_tensorflow_p36 conda install numpy=1.17.2 --yes --quiet conda update tensorflow-neuron
Note: if your tensorflow-neuron version on the inference instance is lower than 220.127.116.11.0.1333.0, you will need to run this demo on inf1.2xlarge instead of inf1.xlarge.
Neuron software works with tensorflow saved models. Users should bring their own BERT-Large saved model for this section. This demo will run inference for the Microsoft Research Paraphrase Corpus(MRPC) task and the saved model should be fine tuned for MRPC. Users who need additional help to fine-tune the model for MRPC or to create a saved model can refer to Appendix 1.
In the same conda environment and directory bert_demo scripts, run the following :
export BERT_LARGE_SAVED_MODEL="/path/to/user/bert-large/savedmodel" python bert_model.py --input_saved_model $BERT_LARGE_SAVED_MODEL --output_saved_model ./bert-saved-model-neuron --batch_size=6 --aggressive_optimizations
This compiles BERT-Large pointed to by $BERT_LARGE_SAVED_MODEL for an input size of 128 and batch size of 6. The compilation output is stored in bert-saved-model-neuron. Copy this to your Inf1 instance for inferencing.
The bert_model.py script encapsulates all the steps necessary for this process. For details on what is done by bert_model.py please refer to Appendix 2.