trainML Documentation
  • Docs
  • Tutorials
  • Blog
  • Login/Signup

›All Blog Posts

All Blog Posts

  • CloudBender
  • NVIDIA NGC Catalog Integration
  • Collaborative Projects for Resource Sharing
  • Customer Provided Job Environments
  • REST Endpoints for Inference
  • Automatic Dependency Installation
  • Consolidated Account Billing
  • Load Model Code Directly From Your Laptop
  • Start Training Models With One Line
  • RTX 3090 (BFGPU) Instances Now Available
  • Build Full Machine Learning Pipelines with trainML Inference Jobs
  • Store Training Results Directly on the trainML Platform
  • Dataset Viewing
  • Stay Modern with Python 3.8 Job Environments
  • Downloadable Log Extracts for Jobs and Datasets
  • Automate Training with the trainML Python SDK
  • trainML Jobs on Google Cloud Platform Instances
  • Spawn Training Jobs Directly From Notebooks
  • Easy Notebook Forking For Rapid Experimentation
  • Making Datasets More Flexible and Expanding Environment Options
  • Kaggle Datasets and API Integration
  • Centralized, Real-Time Training Job Worker Monitoring
  • Free to Use Public Datasets
  • Major UI Overhaul and Direct Notebook Access
  • Load Data Once, Reuse Infinitely
  • Serverless Deep Learning On Private Git Repositories
  • Google Cloud Storage Integration Released
  • Skip the Cloud Data Transfers with Local Storage
  • Web (HTTP/FTP) Data Downloads Plus Auto-Extraction of Archives

NVIDIA NGC Catalog Integration

December 6, 2021

trainML

trainML

trainML is making it even easier to run any GPU-enabled workload by allowing customers to use job images directly from NVIDIA's NGC Catalog.

NVIDIA's NGC Catalog provides enterprise-grade container images, including pre-trained models and industry-specific software packages. By configuring your NGC API key as a trainML Third Party Key, you are able to specify NGC container images as the basis for any job type using the customer provided environment field on the job specification.

How It Works

Create an API key for your NGC account that has access to the images you wish to use. Once you have the API key, go back to the trainML third-party key configuration page, and select NVIDIA NGC from the Add menu under Third-Party Keys. Enter the API key in the NGC API Key field and click the check button.

Go back to the NGC Catalog and find the pull command of the container you with to run. For example, to run the a specific version of the RAPIDS container, search the tags and copy the pull command, e.g. docker pull nvcr.io/nvidia/rapidsai/rapidsai:22.02-cuda11.4-runtime-ubuntu20.04.

To start a Notebook using this container image, go to the trainML Notebook Dashboard and click Create. Select the required resources, data, and model specifications, and expand the Environment section. Select Customer Provided as the Base Environment and paste the image name from the pull command (e.g. nvcr.io/nvidia/rapidsai/rapidsai:22.02-cuda11.4-runtime-ubuntu20.04) as the Image. Additionally, since most images in NGC do not contain the jupyterlab package installed by default, you must add this to the pip package dependencies field to ensure the notebook will start properly. You can find more information about using customer provided job images here.

Note: The disk size of customer provided images count towards the disk size quota (unlike trainML environments). Ensure you are reserving enough disk space to accommodate the image size. CUDA layers can be 3+ GB alone. If the image size is greater than the requested disk space, the job will fail.

Once you submit the job, the trainML platform will automatically download the container image using your NGC account credentials, install the additional required packages, and start the notebook.

Tweet
Recent Posts
  • How It Works
trainML Documentation
Docs
Getting StartedTutorials
Legal
Privacy PolicyTerms of Use
Copyright © 2022 trainML, LLC, All rights reserved