Skip to main content

· 4 min read

Persistent Datasets just got even better. Not only can you use the same dataset across many jobs in parallel at no additional charge, now you can attach multiple datasets to a single job for free. If that wasn't enough, you can now dynamically change the datasets attached to any notebook job as your needs evolve through the model development process. Additionally, more options have been added for job base environments, allowing you to save time and storage quota by using specific versions of popular frameworks.

· 4 min read

Customers using trainML to compete in Kaggle competitions or using public Kaggle datasets for analysis can now directly populate trainML datasets from Kaggle competitions or datasets, as well as automatically load their Kaggle account credentials into notebook and training jobs to use for competition or kernel submissions.

· 3 min read

trainML training jobs can now run on data stored in Google Cloud Storage and upload their results to the same or another bucket. GCP access credentials can also be attached to notebooks and training job workers to provide them with easy access to other GCP services.

· 5 min read

trainML training jobs can now run on data directly from your local computer and upload their results back without using any cloud intermediary. If you already have the data set on your local computer and want to avoid the repetitive cycle of uploading and downloading from cloud storage, this storage type is for you.

· 2 min read

The HTTP input data type is now available for trainML training jobs. This option is ideal for publicly available dataset that are hosted on public HTTP or FTP servers. If you were previously using wget/curl in your training scripts to download data, this option is for you. Additionally, if you specify a path to an archive as the input storage path, the archive will automatically be extracted before being attached to the workers.