trainML customers can now select from a variety of popular public datasets when starting a notebook or training jobs. There is no storage cost for using these datasets, no matter how many jobs you attach them to.
How It Works
Public datasets are a collection of popular public domain machine learning datasets that are loaded and maintained by trainML. If you are planning on using one of the common deep learning datasets for your job, be sure to check the list of public datasets in the job form first before provisioning worker storage and downloading it yourself. If it is not in the list and you would like it added, please contact us with a link to the dataset and a brief description of what you need it for.
Public datasets can be used by selecting Public Dataset
from the Dataset Type
field in the Data
section of the job form. Select the desired dataset from the list and create the job. Once the job is running you can access the dataset in the /opt/trainml/input
directory, or using the TRAINML_DATA_PATH
environment variable.