trainML is making it even easier to run any GPU-enabled workload by allowing customers to use job images directly from NVIDIA's NGC Catalog.
NVIDIA's NGC Catalog provides enterprise-grade container images, including pre-trained models and industry-specific software packages. By configuring your NGC API key as a trainML Third Party Key, you are able to specify NGC container images as the basis for any job type using the customer provided environment field on the job specification.
How It Works
Create an API key for your NGC account that has access to the images you wish to use. Once you have the API key, go back to the trainML third-party key configuration page, and select NVIDIA NGC
from the Add
menu under Third-Party Keys. Enter the API key in the NGC API Key
field and click the check button.
Go back to the NGC Catalog and find the pull command of the container you with to run. For example, to run the a specific version of the RAPIDS container, search the tags and copy the pull command, e.g. docker pull nvcr.io/nvidia/rapidsai/rapidsai:22.02-cuda11.4-runtime-ubuntu20.04
.
To start a Notebook using this container image, go to the Notebook Dashboard and click Create
. Select the required resources, data, and model specifications, and expand the Environment
section. Select Customer Provided
as the Base Environment
and paste the image name from the pull command (e.g. nvcr.io/nvidia/rapidsai/rapidsai:22.02-cuda11.4-runtime-ubuntu20.04
) as the Image
. Additionally, since most images in NGC do not contain the jupyterlab
package installed by default, you must add this to the pip
package dependencies field to ensure the notebook will start properly. You can find more information about using customer provided job images here.
The disk size of customer provided images count towards the disk size quota (unlike trainML environments). Ensure you are reserving enough disk space to accommodate the image size. CUDA layers can be 3+ GB alone. If the image size is greater than the requested disk space, the job will fail.
Once you submit the job, the trainML platform will automatically download the container image using your NGC account credentials, install the additional required packages, and start the notebook.