Models
Models are a great option to store an immutable version of model code and its artifacts for reuse in other jobs. Models only will only incur storage charges for their size, and can be used on unlimited jobs simultaneously.
Models
Models enable you to store a job's model directory permanently to reuse as the starting point for the model directory for subsequent jobs. Models only will incur storage charges for their size, which is also included in the 50 GB free storage option. When a model is used in a job, the job's working directory space must be sufficient to support the model size.
Models are immutable once created. When a model is used in a job, the contents can be modified while that job worker is running, but any changes do not affect the original model or any other jobs using that model. In order to save the modifications, you must save the job as a new model. The maximum size of any model is 50 GB, but you can have unlimited models.
Creating a Model
Models can be created from three different sources: external, notebooks, and training/inference job output.
External Model Source
To create a model from an external sources, navigate to the Models Dashboard from the side navigation and click the Create
button. Specify the name of the new model in the Name
field and then select the Source Type
of the location from which to populate the new model:
AWS
: Select this option if the model data resides on Amazon S3.Azure
: Select this option if the model data resides on Azure Blob Storage.GCP
: Select this option if the model data resides on Google Cloud Storage.Git
: Select this option if the model data resides in a git repository.Kaggle
: Select this option if the model data is from a Kaggle Competition, Dataset, or Kernel.Local
: Select this option if the model data resides on your local computer. You will be required to connect to the model for this option to work. Jobs using the local storage options will wait indefinitely for you to connect.Wasabi
- Select this option if the model data resides on Wasabi Storage.Web
: Select this option if the model data resides on a publicly accessible HTTP or FTP server.
Specify the path of the model data within the storage type specified in the Path
field. If you specify a compressed file (zip, tar, tar.gz, or bz2), the file will be automatically extracted. If you specify a directory path (ending in /
), it will run a sync starting from the path provided, downloading all files and subdirectories from the provided path. Valid paths for each Source Type
are the following:
AWS
: Must begin withs3://
.Azure
: Must begin withhttps://
.GCP
: Must begin withgs://
.Git
: Both http and ssh git repository formats are supported. To use the ssh format, you must configure a git ssh key.Kaggle
: Must be the short name of the competition, dataset, or kernel compatible with the Kaggle API.Local
: Must begin with/
(absolute path),~/
(home directory relative), or$
(environment variable path). Relative paths (using./
) are not supported.Wasabi
: Must begin withs3://
.Web
: Must begin withhttp://
,https://
,ftp://
, orftps://
.
Source Specific fields
Type
(Kaggle Only): The type of Kaggle data you are specifying, Competition, Dataset, or Kernel (Notebook).Endpoint
(Wasabi Only): The service URL of the Wasabi bucket you are using.Path
(Regional Datastore Only): The subdirectory inside the regional datastore to load the data from. Use/
to load the entire datastore.
Click Create
to start populating the model. If you selected any option except Local
, the model download will take place automatically and the model will change to a state of ready
when it is complete. If selected Local
, you must connect to the model by selecting the model and clicking the Connect
button to proceed with the data population.
Notebooks
To create a model from an existing notebook, select the notebook from the Notebook Dashboard and click Copy
. The Copy
button is only enabled when a single notebook is selected and that notebook is either running
or stopped
. Select Save to trainML
as the Copy Type
. Select Model
from the Type
dropdown and enter the name for the new model in the New Model Name
field. You have the option to copy either the /opt/trainml/models
folder or the /opt/trainml/output
folder. Select which folder you wish to copy from the Save Directory
dropdown and click Copy
to being the copy process. You will be automatically navigated to the models dashboard where you can monitor the progress of the model creation.
Training/Inference Job Output
Training or inference jobs can be configured to send their output to a trainML model instead of an external source. To create a model from a job, select trainML
as the Output Type
and model
as the Output URI
in the data section of the job form. When this option is selected, the TRAINML_OUTPUT_PATH
environment variable is redirected to the same location as TRAINML_MODEL_PATH
(/opt/trainml/models
). Once each worker in the training job finished, it will save the entire directory structure of /opt/trainml/models
to a new model with the name Job - <job name>
if there is one worker or Job - <job name> Worker <worker number>
if there are multiple workers.
Using a Model
Models can be used by selecting trainML
from the Model Type
field in the Model
section of the job form. Select the desired model from the list and create the job. Once the job is running you can access the model in the /opt/trainml/models
directory, or using the TRAINML_MODEL_PATH
environment variable.
Removing a Model
Models can only be removed once all jobs that are configured to use them are finished. To remove a model, ensure that the Active Jobs
column is zero, select the model, and click the Delete
button. Since this action is permanent, you will be prompted to confirm prior to deleting.