Free to Use Public Datasets

September 3, 2020 · 2 min read

proxiML customers can now select from a variety of popular public datasets when starting a notebook or training jobs. There is no storage cost for using these datasets, no matter how many jobs you attach them to.

How It Works

Public datasets are a collection of popular public domain machine learning datasets that are loaded and maintained by proxiML. If you are planning on using one of the common deep learning datasets for your job, be sure to check the list of public datasets in the job form first before provisioning worker storage and downloading it yourself. If it is not in the list and you would like it added, please contact us with a link to the dataset and a brief description of what you need it for.

Public datasets can be used by selecting Public Dataset from the Dataset Type field in the Data section of the job form. Select the desired dataset from the list and create the job. Once the job is running you can access the dataset in the /opt/ml/input directory, or using the ML_DATA_PATH environment variable.

How It Works​

How It Works