Skip to main content

Deploying an Image Classification Endpoint

This tutorial should cost less than 0.05 credits ($0.05) if you use the RTX 2080 Ti instance type and the same job settings as the guide.

Prerequisites

Before beginning this example, ensure that you have satisfied the following prerequisites.

Understanding the Model

The model code use for the endpoint can be found here. The code is largely adapted from code published by LearnOpenCV. This code uses a pre-trained VGG model to perform the image classification, so we can use it directly instead of running model training (model training is covered in other tutorials). All the code is contained in the predict.py file of the repository. The predict_image function is where actual inference occurs. It takes a name of an image file that must exist on the local file system as its only parameter and returns an array of the top 5 predicted classes for the image with their confidence scores.

For any REST endpoint, any data used for inference must be provided as part of the HTTP request. Inference data is mostly commonly as a POST request since this is the method intended for the client to send new data to the server. The most common request body type when integrating between backend systems is JSON. Since this JSON payload is the only way to get the inference data from the client to the server, the model code must be able to translate the allowable JSON data types into the data format the model is expecting (in this case, a binary file). Binary data is not permitted in JSON, so the most common way of sending binary data in a JSON payload is through base64 encoding.

Since this function is also used for automated batch inference, the function itself cannot change. Instead there is a helper function predict_base64_image that takes a file name and the base64 encoded image as parameters. This function creates a temporary file, writes the decoded image contents to the file, and runs the predict_image function on the temporary file, and returns a dictionary with the name of the file and it's results.

Using this function, the same model code can now process endpoint requests as performs batch inference.

Creating the Endpoint

Now that the model code is ready to accept JSON payload data, the next stop is to create the trainML Endpoint. You can do this from the Endpoint Dashboard or by running python deploy_endpoint.py in the tutorial code repository using a python virtual environment with the trainML CLI/SDK installed and configured.

If using the web interface, click the Create button on the Endpoint Dashboard. Enter a memorable name, select the RTX 2080 Ti GPU Type and leave the rest of the resources section as default. In the Model section, select git as the Source Type and enter https://github.com/trainML/simple-classifier-endpoint.git as the Source Uri.

Endpoints allow you to configure "Routes" instead of workers. Routes allow you to direct HTTP requests to your model code. Click the Add Route button to add a new route. Keep POST as the HTTP Verb and specify /predict as the Path. Specify the predict.py as the File Name and predict_base64_image as Function Name. This will instruct the trainML endpoint to call the predict_base64_image method of the predict.py file anytime someone makes a POST request to the /predict URL path of the endpoint.

Since the function expects to parameters name and contents, click Add Parameter twice and enter name and contents as the parameter names and String as the data types. Neither parameter is optional. This body template both defines the acceptable JSON payload the user can post to this route as well as maps the payload to the function arguments. If a request's body does not match the template, the request will be rejected. Click Next to review your endpoint configuration and Create to deploy the endpoint.

Once the endpoint is running, click Connect to get the endpoint address for the new endpoint. If you used the script instead of the web interface, it will have printed the endpoint URL to the terminal.

Getting Inference Results

The deployed endpoint can be access the same way you would access any other web server. Two examples are provided in the tutorial code repository, using curl from the command line, and using a React client application in the browser.

Using the Command LIne

To get a prediction for the endpoint from the command line, you can use the classify_image.sh bash script in the repository. Simply specify the endpoint address from above and the path to a file you want to predict as the two arguments. For example:

./classify_image.sh https://<endpoint address> ./images/pizza.jpg

Using a Browser

Open the file front-end/src/config.js with a text editor. Change the api_address value to the endpoint URL from the previous step and save the file. If you changed the route path in the previous step, you must also update that here.

Go to the front-end folder of the repository in a terminal window and type npm start. This will open a web browser to http://localhost:3000 and load the example front end. Click the Upload File button and select an image to classify (example image files are in the images folder of this repository). Click Get Prediction to send the file to the endpoint. When the response comes back, the list of the top five class categories and their confidence rating will be displayed on the right. Click Upload New File and Get Prediction on additional images as desired.

Cleaning Up

Like Notebooks, Endpoints will continue to run until stopped, incurring GPU costs for the time they are in the running state. When you are done with the endpoint, select it from the Endpoint Dashboard and click Stop. Endpoints can be restarted at any time and incur storage charges for the duration of their existence. Click Terminate to permanently delete the endpoint.