Skip to main content

Get Started With Notebooks

The easiest way to start a new project on the trainML platform is with a trainML Notebook. This tutorial walks through the process of starting your first notebook and using it to train a PyTorch model on ImageNet.

Creating the Notebook

Navigate to the Notebooks Dashboard and click the Create button. Input a memorable name as the job name and select an available GPU Type (the code in this tutorial assumes a RTX 3090). Expand the Data section and click Add Dataset. Select Public Dataset as the dataset type and select ImageNet. Expand the Model section. Keep git selected as the Model Type and specify the tutorial code git repository url https://github.com/trainML/examples.git in the Model Code Location field to automatically download the tutorial's model code. Click Next to view a summary of the new notebook and click Create to start the notebook.

Using the Notebook

When the notebook reaches the running state, the Open button will appear. Click Open to open the Jupyter Lab environment. You will see the file explorer on the left pane with two folders input and models. The input folder contains the ImageNet dataset you selected when creating the notebook. The models folder contains the contents of the git repository you specified when creating the notebook.

On the right pane, Jupyter Lab will display the Launcher window that allows you to create new notebooks, files, access the terminal, etc. Click Terminal in the Other section to open a terminal window. The default current working direct when opening a terminal is the models folder. If you run ls, you will see the root file structure of the git repository. Run nvidia-smi to see details about the attached GPU. For more information on using a Jupyter Notebook, refer to the project documentation.

To start training the example model, open the example notebook by navigating to models/notebooks in the file explorer pane and double clicking on the pytorch-imagenet.ipynb file. This will open the notebook in the right pane. This notebook is an adapted version of the ImageNet training script provided in the PyTorch Examples repository. Scroll down to the second code section with header Hyperparameters. Here you can see the default settings for training and modify them as needed.

From the Run option in the menu bar, select Run All Cells to start training. Scroll down to the bottom of the notebook to see the output from the training loop. Training will run for approximately 3 hours on an RTX 3090. This time can be shortened or lengthened by modifying the epochs hyperparameter.

If you are planning to continue with the Parallel Training Experiments with Notebooks tutorial, you can reuse this notebook while it is running.

Stopping and Terminating the Notebook

When training is complete, you will see the training artifacts (checkpoint.pth.tar and model_best.pth.tar) in the same folder as the notebook. To download these files to your local computer, right click on them and select Download from the menu. Once you're finished, stop the job by either clicking Stop from the trainML dashboard, or shutdown the notebook server its File menu. This will stop billing.

Warning

Closing Notebook window does not stop it, it only disconnects you from the notebook server. You will be billed for a running notebook even if you are not connected to it.

Stopped notebooks can be restarted, and will retain any modifications to the environment made in previous sessions. If you want to restart a notebook, click the Restart button.

caution

Stopped notebooks may be automatically purged after two weeks of non-use. Be sure to save your work before stopping a notebook.

When you are completely done with the tutorial, click the Terminate button. This will purge the notebook environment and all its data.