This tutorial should cost less than 0.05 credits ($0.05) if you use the RTX 2080 Ti instance type and the same job settings as the guide.
Before beginning this example, ensure that you have satisfied the following prerequisites.
- A valid trainML account with a non-zero credit balance
- A python virtual environment with the trainML CLI/SDK installed and configured.
- A current version of Node.js installed.
Understanding the Model
The model code use for the endpoint can be found here. The code is largely adapted from code published by LearnOpenCV. This code uses a pre-trained VGG model to perform the image classification, so we can use it directly instead of running model training (model training is covered in other tutorials). All the code is contained in the
predict.py file of the repository. The
predict_image function is where actual inference occurs. It takes a name of an image file that must exist on the local file system as its only parameter and returns an array of the top 5 predicted classes for the image with their confidence scores.
For any REST endpoint, any data used for inference must be provided as part of the HTTP request. Inference data is mostly commonly as a POST request since this is the method intended for the client to send new data to the server. The most common request body type when integrating between backend systems is JSON. Since this JSON payload is the only way to get the inference data from the client to the server, the model code must be able to translate the allowable JSON data types into the data format the model is expecting (in this case, a binary file). Binary data is not permitted in JSON, so the most common way of sending binary data in a JSON payload is through base64 encoding.
Since this function is also used for automated batch inference, the function itself cannot change. Instead there is a helper function
predict_base64_image that takes a file name and the base64 encoded image as parameters. This function creates a temporary file, writes the decoded image contents to the file, and runs the
predict_image function on the temporary file, and returns a dictionary with the name of the file and it's results.
Using this function, the same model code can now process endpoint requests as performs batch inference.
Creating the trainML Endpoint
Now that the model code is ready to accept JSON payload data, the next stop is to create the trainML Endpoint. You can do this from the Endpoint Dashboard or by running
python deploy_endpoint.py in the tutorial code repository using a python virtual environment with the trainML CLI/SDK installed and configured.
If using the web interface, click the
Create button on the Endpoint Dashboard. Enter a memorable name, select the
RTX 2080 Ti GPU Type and leave the rest of the resources section as default. In the
Model section, select
git as the
Source Type and enter
https://github.com/trainML/simple-tensorflow-classifier.git as the
Endpoints allow you to configure "Routes" instead of workers. Routes allow you to direct HTTP requests to your model code. Click the
Add Route button to add a new route. Keep
POST as the
HTTP Verb and specify
/predict as the
Path. Specify the
predict.py as the
File Name and
Function Name. This will instruct the trainML endpoint to call the
predict_base64_image method of the
predict.py file anytime someone makes a
POST request to the
/predict URL path of the endpoint.
Since the function expects to parameters
Add Parameter twice and enter
contents as the parameter names and
String as the data types. Neither parameter is optional. This body template both defines the acceptable JSON payload the user can post to this route as well as maps the payload to the function arguments. If a request's body does not match the template, the request will be rejected. Click
Next to review your endpoint configuration and
Create to deploy the endpoint.
Once the endpoint is running, click
Connect to get the endpoint address for the new endpoint. If you used the script instead of the web interface, it will have printed the endpoint URL to the terminal.
Getting Inference Results
The deployed endpoint can be access the same way you would access any other web server. Two examples are provided in the tutorial code repository, using
curl from the command line, and using a React client application in the browser.
Using the Command LIne
To get a prediction for the endpoint from the command line, you can use the
classify_image.sh bash script in the repository. Simply specify the endpoint address from above and the path to a file you want to predict as the two arguments. For example:
./classify_image.sh https://<job_id>.trainml.cloud ./images/pizza.jpg
Using a Browser
Open the file
front-end/src/config.js with a text editor. Change the
api_address value to the endpoint URL from the previous step and save the file. If you changed the route path in the previous step, you must also update that here.
Go to the
front-end folder of the repository in a terminal window and type
npm start. This will open a web browser to http://localhost:3000 and load the example front end. Click the
Upload File button and select an image to classify (example image files are in the
images folder of this repository). Click
Get Prediction to send the file to the endpoint. When the response comes back, the list of the top five class categories and their confidence rating will be displayed on the right. Click
Upload New File and
Get Prediction on additional images as desired.
Like Notebooks, Endpoints will continue to run until stopped, incurring GPU costs for the time they are in the running state. When you are done with the endpoint, select it from the Endpoint Dashboard and click
Stop. Endpoints can be restarted at any time and incur storage charges for the duration of their existence. Click
Terminate to permanently delete the endpoint.