trainML customers can now select from a variety of popular public datasets when starting a notebook or training jobs. There is no storage cost for using these datasets, no matter how many jobs you attach them to.
How It Works
Public datasets are a collection of popular public domain machine learning datasets that are loaded and maintained by trainML. If you are planning on using one of the common deep learning datasets for your job, be sure to check the list of public datasets in the job form first before provisioning worker storage and downloading it yourself. If it is not in the list and you would like it added, please contact us with a link to the dataset and a brief description of what you need it for.
Public datasets can be used by selecting
Public Dataset from the
Dataset Type field in the
Data section of the job form. Select the desired dataset from the list and create the job. Once the job is running you can access the dataset in the
/opt/trainml/input directory, or using the
TRAINML_DATA_PATH environment variable.