Skip to main content

Load Data Once, Reuse Infinitely

· 2 min read

The trainML platform now allows customers to store datasets permanently and reuse those datasets for as many notebook and training jobs as desired.

How It Works

Creating a Dataset

Any datasets created through a training job automatically become persistent after they are loaded. In the data section of the training job form, select Create New Dataset from the Dataset Type field. You can then choose the source of the data from the Input Data Type field and specify the URI of the data in the Input Data Storage Path field. When the job starts, the new dataset will be created with the default name of Job - <Job Name>.

Using a Persistent Dataset

Datasets can be used by selecting My Dataset from the Dataset Type field in the Data section of the job form. Select the desired dataset from the list and create the job. Once the job is running you can access the dataset in the /opt/trainml/input directory, or using the TRAINML_DATA_PATH environment variable.

Pricing

Persistent datasets size is included in the monthly storage charge. The first 50 GB of storage is free each month, the storage charge is 0.20 credits per GB per Month for any storage in excess of 50 GB. Private datasets are charged based on the actual size of the dataset once it has been created. A dataset's size counts towards your monthly storage charge as long as it exists, whether or not it is currently being used by a job or by how many. For example, if you create a dataset that is 50 GB on the 15th of the month, that dataset will count for 25 GB/month in the computation of your monthly storage charge for that month. This number is the same if you never use the dataset on a job the entire month, or use it on 100 separate jobs concurrently for the rest of the month.