Skip to main content

REST Endpoints for Inference

· 6 min read

The trainML platform has been extended to support deploying models as REST API endpoints. These fully managed endpoints give you the real-time predictions you need for production applications without having to worry about servers, certificates, networking, or web development.

How It Works

trainML Endpoints allow you to expose your model code as a REST API endpoint. The endpoint runs a webserver on a publicly accessible URL that will call functions inside the model code you supply based on the configuration you specify. Once you have a trained model with inference code saved as a trainML Model, in a git repository, or on your local computer, you can configure a trainML Endpoint to run that inference code.

Endpoints have one or more "routes", which are defined by the URL path and HTTP verb of the request that the endpoint should accept. For example, a POST request to path /predict is a separate route from a GET request to the same path, and can therefore call different functions with different parameters. For each route, you specify the Python file that contains the code to be called as well as the method within that file to call.

The input data to the prediction is provided in the HTTP request body (only POST requests are currently supported). Each route allows you to configure the expected Request Body Template. This body template both validates that the request body is properly formatted as well as maps the request body attributes to the arguments the specified function expects. If the function uses positional arguments, simply define the request body with the attributes in the same order as the function expects. If the function uses keyword arguments, define the request body to have attributes with the same name as the function arguments.

Once the endpoint is created and running, it will return the hostname to reach the endpoint. Use any HTTP client (e.g. curl, Postman, etc.) to query the configured route and get the response. All endpoints also have the GET method defined for the /ping URL that can be used for health checking.

tip

If the endpoint begins to be overloaded, it will return HTTP 503 errors. If you are calling the endpoint programmatically, be sure to look for this error response and retry the request using a backoff algorithm

Endpoints will run until stopped and are billed the same way as other job types.

Using the Web Platform

Navigate to the Endpoint Dashboard and click the Create button. Specify the name and required resources for the endpoint. In the Models section, specify the trainML Model, git repository, or local computer directory that contains the trained model and inference code.

In the Endpoint section, click Add Route. Keep POST as the HTTP Verb and specify /predict as the Path. Specify the file within the model code to use as the File Name (e.g. predict.py) and the method to call as the Function Name (e.g. predict_file). Click Add Parameter to configure the information that will be provided in the HTTP request body when the endpoint is called. These parameters must exactly match the arguments the specified function is expecting. If the function is expecting positional arguments, ensure the Function Uses Positional Arguments box is checked.

Specify the expected Data Type for each parameter. String, Integer, Float, Boolean, Object, and List data types are supported. If the parameter is optional, check the Optional box and specify a Default Value. The default value must be valid Python (e.g. None should be used rather than null or undefined). See FastAPI's request body documentation for more details.

If a request's body does not match the template, the request will be rejected.

Click Next and Create to start the endpoint. Once the endpoint successfully starts, the dashboard should indicate that the endpoint is in the running state. Click the Connect button to view public URL of the endpoint as well as an example command to query one of it's routes.

You can also access the endpoint's logs by clicking the View button. Log messages are sorted in descending order (most recent on top) and new log messages appear automatically as they are generated. If there are many log message, you can scroll down on the page to see older logs.

To view detailed information about the endpoint, click on the endpoint name from the Endpoint dashboard.

Using the SDK

You can also create an endpoint using the trainML SDK. Specify endpoint as the type and provide an endpoint dictionary to the job create command. The endpoint dictionary requires a routes key that is a list of route. Each route is a dictionary with path, verb, function, file, positional, and body keys. The body key is a list of the parameters for the request body, with keys name, type, optional, and default_value. An example specification is the following:


job = await trainml.jobs.create(
name="Endpoint",
type="endpoint",
...
endpoint=dict(
routes=[
dict(
path="/predict",
verb="POST",
function="predict_image",
file="predict",
positional=True,
body=[dict(name="filename", type="str")],
)
]
),
)

The type field for the parameters must be one of str, int, float, bool, dict, or list. Once the job is in the running state, you can retrieve the endpoint URL from the url property of the job (e.g. job.url).

Using the CLI

The trainML CLI can also create endpoints. Endpoint routes can be specified using the --route option. The data for each route must be properly formatted JSON with the same keys and values as required in the SDK above. An example is the following:

trainml job create endpoint \
--git-uri https://github.com/trainML/simple-tensorflow-classifier.git \
--route '{"verb": "POST", "path": "/predict", "file": "predict", "positional": true, "body": [{"name": "filename", "type": "str"}], "function": "predict_image"}' \
"New Inference Endpoint"

This command will print the endpoint URL to the terminal once the endpoint is running.