Running an Inference Job
Inference jobs are designed to run trained models on new data as part of a model inference pipeline and deliver the predictions back to an external location.
Starting a Job
Run an Inference Job link on the Home screen or the
Create button from the Inference Jobs Dashboard to open a new job form. Enter a job name that will uniquely identify this job for you. Select the type of GPU you would like to use by clicking an available GPU card. Select how many GPUs you want attached to each worker in the
GPU Count field. A maximum of 4 GPUs per inference job is allowed. If any options in these fields are disabled, there are not enough GPUs of that type available to satisfy your request. Specify the amount of disk space you want allocated for this job's working directory in the
Disk Size field. Be sure to allocate enough space to complete this job, as this allocation cannot be changed once the job is created.
Inference jobs will automatically download the source data and upload the predictions as part of their execution. To specify the location of the source data, select the required storage provider from the
Input Type field and the path to the data in the
Input Storage Path field. When the job starts, the data from this location will be placed into the
/opt/trainml/input directory. To specify where to send the output predictions, select the required storage provider from the
Output Type field and the path to send the results to in the
Output Storage Path field.
In order for the automatic output to work, you must save the resuls in the
/opt/trainml/output folder. The recommended way to configure this in your code is to use the TRAINML_OUTPUT_PATH environment variable.
If you created a trainML model in the previous step, select
trainML Model as the
Model Type and select it from the list. Otherwise, select
Git and specify the git clone URL of the repository.
Specify the command to use to perform the inference operation with the selected model on the data that will be loaded. The command specified will be run at the root of the model that was loaded based on the models section. For example, if your inference code is called
predict.py and takes parameters for the data location (--data-path) and where to save the predictions (--output-path), the command would be the following:
python predict.py --data-path=$TRAINML_DATA_PATH --output-path=$TRAINML_OUTPUT_PATH
This command takes advantage of the trainML environment variables to ensure the code is utilizing the correct directory structure.
Once you click Next on the job form, you are given the opportunity to review your inference job configuration for errors. Review these settings carefully. They cannot be changed once a job is started.
If the number of GPUs requested exceeds the current available GPUs of that type, you will receive a message on the review form stating that job will queue until GPUs become available. When this occurs, the job will wait until GPUs of the type you selected become available. You are not billed for waiting jobs.
Monitoring the Job
Once a job successfully starts, the dashboard should indicate that the job is in the
running state. Click the
View button to access the job logs. Log messages are sorted in descending order (most recent on top) and new log messages appear automatically as they are generated. If there are many log message, you can scroll down on the page to see older logs.
To view detailed information about the job, click on the job name from the Inference Job dashboard.
Stopping and Terminating a Job
When each worker finishes executing, it automatically stops, and billing for that worker also stops. When all workers complete, the job is considered finished. You can also interrupt a running job or job worker by clicking
Stop on either the job or the job worker.
Finished jobs may be automatically purged after 24 hours.
When a job is finished, you can review the its details and download an extracts of the worker logs. When you no longer need to see the details of the job, click the