The trainML platform now allows customers to store models permanently and reuse those models for as many notebook and training jobs as desired.
How It Works
Models enable you to store a job's model directory permanently to reuse as the starting point for the model directory for subsequent jobs. Models only will incur storage charges for their size, which is also included in the 50 GB free storage option. When a model is used in a job, the job's working directory space must be sufficient to support the model size.
Models are immutable once created. When a model is used in a job, the contents can be modified while that job worker is running, but any changes do not affect the original model or any other jobs using that model. In order to save the modifications, you must save the job as a new model. The maximum size of any model is 50 GB, but you can have unlimited models.
Creating a Model
Models can be created from three different sources: external, notebooks, and training jobs.
External Model Source
To create a model from an external sources, navigate to the Models Dashboard from the side navigation and click the Create
button. Fill out the form with the necessary information and click Create
. Once the model changes to the status ready
, it can be used on new jobs.
Notebooks
To create a model from an existing notebook, select the notebook from the Notebook Dashboard and click Copy
. The Copy
button is only enabled when a single notebook is selected and that notebook is either running
or stopped
. Select Save as Model
as the Copy Type
. Enter the name for the new model in the Name
field and click Copy
. You will be automatically navigated to the models dashboard where you can monitor the progress of the model creation. The model will then be populated from the current contents of the /opt/trainml/models
directory inside the notebook instance.
Training Jobs
Training jobs can be configured to send their output to a trainML model instead of an external source. To create a model from a training job, select trainML
as the Output Type
in the data section of the job form. When this option is selected, the TRAINML_OUTPUT_PATH
environment variable is redirected to the same location as TRAINML_MODEL_PATH
(/opt/trainml/models
). Once each worker in the training job finished, it will save the entire directory structure of /opt/trainml/models
to a new model with the name Job - <job name>
if there is one worker or Job - <job name> Worker <worker number>
if there are multiple workers.
Using a Model
Models can be used by selecting trainML
from the Model Type
field in the Model
section of the job form. Select the desired model from the list and create the job. Once the job is running you can access the model in the /opt/trainml/models
directory, or using the TRAINML_MODEL_PATH
environment variable.