The trainML platform now allows customers to store datasets permanently and reuse those datasets for as many notebook and training jobs as desired.
How It Works
Creating a Dataset
Any datasets created through a training job automatically become persistent after they are loaded. In the data section of the training job form, select Create New Dataset
from the Dataset Type
field. You can then choose the source of the data from the Input Data Type
field and specify the URI of the data in the Input Data Storage Path
field. When the job starts, the new dataset will be created with the default name of Job - <Job Name>
.
Using a Persistent Dataset
Datasets can be used by selecting My Dataset
from the Dataset Type
field in the Data
section of the job form. Select the desired dataset from the list and create the job. Once the job is running you can access the dataset in the /opt/trainml/input
directory, or using the TRAINML_DATA_PATH
environment variable.
Pricing
Persistent datasets size is included in the monthly storage charge. The first 50 GB of storage is free each month, the storage charge is 0.20 credits per GB per Month for any storage in excess of 50 GB. Private datasets are charged based on the actual size of the dataset once it has been created. A dataset's size counts towards your monthly storage charge as long as it exists, whether or not it is currently being used by a job or by how many. For example, if you create a dataset that is 50 GB on the 15th of the month, that dataset will count for 25 GB/month in the computation of your monthly storage charge for that month. This number is the same if you never use the dataset on a job the entire month, or use it on 100 separate jobs concurrently for the rest of the month.