Skip to main content

Deploying an Inference Endpoint

Endpoints are used to deploy a trained model as a REST API. Endpoints are useful when predictions are needed in real-time and the input data is small enough to be sent through an HTTP request.

Creating an Endpoint

Click the Deploy an Inference Endpoint link on the Home screen or the Create button from the Endpoints Dashboard to open a new job form. Enter a name that will uniquely identify this endpoint for you. Select the type of GPU you would like to use by clicking an available GPU card. Select how many GPUs you want attached to the endpoint in the GPU Count field. A maximum of 4 GPUs per endpoint is allowed. If any options in these fields are disabled, there are not enough GPUs of that type available to satisfy your request. Specify the amount of disk space you want allocated for this endpoint's working directory in the Disk Size field. Be sure to allocate enough space to run the endpoint, as this allocation cannot be changed once it is created.

Model

If you created a trainML model in the previous step, select trainML as the Model Type and select it from the list. Otherwise, select Git and specify the git clone URL of the repository.

Routes

Routes allow you to direct HTTP requests to your model code. Click the Add Route button to add a new route. Keep POST as the HTTP Verb and specify /predict as the Path. Specify the file within the model code to use as the File Name (e.g. predict.py) and the method to call as the Function Name (e.g. predict_file).

Click Add Parameter to configure the information that will be provided in the HTTP request body when the endpoint is called. These parameters must exactly match the arguments the specified function is expecting. If the function is expecting positional arguments, ensure the Function Uses Positional Arguments box is checked. Ensure that the parameters are ordered within the template based on the order they should be supplied to the function. If the function expects keyword arguments, the parameter name of the request body must match the argument name of the function.

Specify the expected Data Type for each parameter. If a request's body does not match the template, the request will be rejected.

Review

Once you click Next on the job form, you are given the opportunity to review your endpoint configuration for errors. Review these settings carefully. They cannot be changed once a job is started.

info

If the number of GPUs requested exceeds the current available GPUs of that type, you will receive a message on the review form stating that job will queue until GPUs become available. When this occurs, the job will wait until GPUs of the type you selected become available. You are not billed for waiting jobs.

Using the Endpoint

Once the endpoint successfully starts, the dashboard should indicate that the endpoint is in the running state. Click the Connect button to view public URL of the endpoint as well as an example command to query one of it's routes.

You can also access the endpoint's logs by clicking the View button. Log messages are sorted in descending order (most recent on top) and new log messages appear automatically as they are generated. If there are many log messages, you can scroll down on the page to see older logs.

To view detailed information about the endpoint, click on the endpoint name from the Endpoint dashboard.

tip

If the endpoint begins to be overloaded, it will return HTTP 503 errors. If you are calling the endpoint programmatically, be sure to look for this error response and retry the request using a backoff algorithm

All endpoints are configured to return an HTTP 200 response to GET requests on the route /ping. This route can be used for endpoint healthchecks.

Stopping, Restarting, and Terminating an Endpoint

When you are done actively using the endpoint, you can stop it by clicking Stop from the trainML dashboard. This will stop billing.

If you want to restart the endpoint, click the Restart button. Endpoints will always start from the same source environment and model code when they were originally created. Any temporary files or changes to the environment that may have occurred during a previous endpoint run will not be preserved. This is to ensure environment integrity

caution

Stopped endpoints may be automatically purged after two weeks of non-use.

If you are finished with an endpoint, click the Terminate button. This will purge the endpoint's environment and all its data. If an endpoint fails, it cannot be restarted, only terminated.