The easiest way to start a new project on the trainML platform is with a trainML Notebook. This tutorial walks through the process of starting your first notebook and using it to train a PyTorch model on ImageNet.
Creating the Notebook
Navigate to the Notebooks Dashboard and click the
Create button. Input a memorable name as the job name and select an available GPU Type (the code in this tutorial assumes a
RTX 3090). Expand the
Data section and click
Add Dataset. Select
Public Dataset as the dataset type and select
ImageNet. Expand the
Model section. Keep
git selected as the
Model Type and specify the tutorial code git repository url
https://github.com/trainML/examples.git in the
Model Code Location field to automatically download the tutorial's model code. Click
Next to view a summary of the new notebook and click
Create to start the notebook.
Using the Notebook
When the notebook reaches the
running state, the
Open button will appear. Click
Open to open the Jupyter Lab environment. You will see the file explorer on the left pane with two folders
input folder contains the ImageNet dataset you selected when creating the notebook. The
models folder contains the contents of the git repository you specified when creating the notebook.
On the right pane, Jupyter Lab will display the
Launcher window that allows you to create new notebooks, files, access the terminal, etc. Click
Terminal in the
Other section to open a terminal window. The default current working direct when opening a terminal is the
models folder. If you run
ls, you will see the root file structure of the git repository. Run
nvidia-smi to see details about the attached GPU. For more information on using a Jupyter Notebook, refer to the project documentation.
To start training the example model, open the example notebook by navigating to
models/notebooks in the file explorer pane and double clicking on the
pytorch-imagenet.ipynb file. This will open the notebook in the right pane. This notebook is an adapted version of the ImageNet training script provided in the PyTorch Examples repository. Scroll down to the second code section with header
Hyperparameters. Here you can see the default settings for training and modify them as needed.
Run option in the menu bar, select
Run All Cells to start training. Scroll down to the bottom of the notebook to see the output from the training loop. Training will run for approximately 3 hours on an
RTX 3090. This time can be shortened or lengthened by modifying the
If you are planning to continue with the Parallel Training Experiments with Notebooks tutorial, you can reuse this notebook while it is running.
Stopping and Terminating the Notebook
When training is complete, you will see the training artifacts (
model_best.pth.tar) in the same folder as the notebook. To download these files to your local computer, right click on them and select
Download from the menu. Once you're finished, stop the job by either clicking
Stop from the trainML dashboard, or shutdown the notebook server its File menu. This will stop billing.
Closing Notebook window does not stop it, it only disconnects you from the notebook server. You will be billed for a running notebook even if you are not connected to it.
Stopped notebooks can be restarted, and will retain any modifications to the environment made in previous sessions. If you want to restart a notebook, click the
Stopped notebooks may be automatically purged after two weeks of non-use. Be sure to save your work before stopping a notebook.
When you are completely done with the tutorial, click the
Terminate button. This will purge the notebook environment and all its data.