Running TensorFlow BERT-Large with AWS Neuron

This example shows a Neuron compatible BERT-Large implementation that is functionally equivalent to open source BERT-Large model. This demo uses TensorFlow-Neuron, BERT-Large weights fine tuned for MRPC and also shows the performance achieved by the Inf1 instance. For users who want to use public BERT SavedModels please also follow the steps described below.

Table of Contents

  1. Launch EC2 instances
  2. Compiling Neuron compatible BERT-Large
    • Update compilation EC2 instance
    • Compile open source BERT-Large saved model using Neuron compatible BERT-Large implementation
  3. Running the inference demo
    • Update inference EC2 instance
    • Launching the BERT-Large demo server
    • Sending requests to server from multiple clients