Persistent Datasets just got even better. Not only can you use the same dataset across many jobs in parallel at no additional charge, now you can attach multiple datasets to a single job for free. If that wasn't enough, you can now dynamically change the datasets attached to any notebook job as your needs evolve through the model development process. Additionally, more options have been added for job base environments, allowing you to save time and storage quota by using specific versions of popular frameworks.
Kaggle Datasets and API Integration
Customers using trainML to compete in Kaggle competitions or using public Kaggle datasets for analysis can now directly populate trainML datasets from Kaggle competitions or datasets, as well as automatically load their Kaggle account credentials into notebook and training jobs to use for competition or kernel submissions.
Centralized, Real-Time Training Job Worker Monitoring
Training jobs' worker log output can now be viewed centrally from the trainML platform in real-time. Keep an eye on all your job workers' training progress at the same time, so you can stop them early if they are no longer making progress.
Free to Use Public Datasets
trainML customers can now select from a variety of popular public datasets when starting a notebook or training jobs. There is no storage cost for using these datasets, no matter how many jobs you attach them to.
Major UI Overhaul and Direct Notebook Access
The trainML platform experience has been redesigned to make it easier to manage notebooks, training jobs, and datasets independently. Additionally, Notebooks are now directly access from the web interface instead of launched through the connection utility.
Load Data Once, Reuse Infinitely
The trainML platform now allows customers to store datasets permanently and reuse those datasets for as many notebook and training jobs as desired.
Serverless Deep Learning On Private Git Repositories
You can now run trainML serverless deep learning training jobs on model code hosted in private git repositories that support SSH authentication.
Google Cloud Storage Integration Released
trainML training jobs can now run on data stored in Google Cloud Storage and upload their results to the same or another bucket. GCP access credentials can also be attached to notebooks and training job workers to provide them with easy access to other GCP services.
Skip the Cloud Data Transfers with Local Storage
trainML training jobs can now run on data directly from your local computer and upload their results back without using any cloud intermediary. If you already have the data set on your local computer and want to avoid the repetitive cycle of uploading and downloading from cloud storage, this storage type is for you.
Web (HTTP/FTP) Data Downloads Plus Auto-Extraction of Archives
The HTTP input data type is now available for trainML training jobs. This option is ideal for publicly available dataset that are hosted on public HTTP or FTP servers. If you were previously using wget/curl in your training scripts to download data, this option is for you. Additionally, if you specify a path to an archive as the input storage path, the archive will automatically be extracted before being attached to the workers.