trainML is making it even easier to run any GPU-enabled workload by allowing customers to use job images directly from NVIDIA's NGC Catalog.
NVIDIA's NGC Catalog provides enterprise-grade container images, including pre-trained models and industry-specific software packages. By configuring your NGC API key as a trainML Third Party Key, you are able to specify NGC container images as the basis for any job type using the customer provided environment field on the job specification.
How It Works
Create an API key for your NGC account that has access to the images you wish to use. Once you have the API key, go back to the trainML third-party key configuration page, and select
NVIDIA NGC from the
Add menu under Third-Party Keys. Enter the API key in the
NGC API Key field and click the check button.
Go back to the NGC Catalog and find the pull command of the container you with to run. For example, to run the a specific version of the RAPIDS container, search the tags and copy the pull command, e.g.
docker pull nvcr.io/nvidia/rapidsai/rapidsai:22.02-cuda11.4-runtime-ubuntu20.04.
To start a Notebook using this container image, go to the trainML Notebook Dashboard and click
Create. Select the required resources, data, and model specifications, and expand the
Environment section. Select
Customer Provided as the
Base Environment and paste the image name from the pull command (e.g.
nvcr.io/nvidia/rapidsai/rapidsai:22.02-cuda11.4-runtime-ubuntu20.04) as the
Image. Additionally, since most images in NGC do not contain the
jupyterlab package installed by default, you must add this to the
pip package dependencies field to ensure the notebook will start properly. You can find more information about using customer provided job images here.
Note: The disk size of customer provided images count towards the disk size quota (unlike trainML environments). Ensure you are reserving enough disk space to accommodate the image size. CUDA layers can be 3+ GB alone. If the image size is greater than the requested disk space, the job will fail.
Once you submit the job, the trainML platform will automatically download the container image using your NGC account credentials, install the additional required packages, and start the notebook.