Customers using trainML to compete in Kaggle competitions or using public Kaggle datasets for analysis can now directly populate trainML datasets from Kaggle competitions or datasets, as well as automatically load their Kaggle account credentials into notebook and training jobs to use for competition or kernel submissions.
How It Works
To enable Kaggle integration in the trainML platform, you must first generate a Kaggle API token. Instructions to generate a new token can be found here. If you are already using the Kaggle CLI tool on your local computer, the API token is usually located at $HOME/.kaggle/kaggle.json
.
Adding Kaggle API Credentials
Once you have the kaggle.json
file for your account, you can upload it to the trainML platform in the Third-Party Keys
section of the Settings
page. You can access this page by clicking on the Settings
menu option on the sidebar or by clicking your account name in the upper right side of the toolbar. Navigate to the Third-Party Keys
section and click the Add
button. Select Kaggle
from the list. Click the Upload Json File
button that appears next to the trophy icon and select the kaggle.json
file from your local computer. Click the blue checkmark button to upload the file. If the file is successfully uploaded, you should see Credentials File: kaggle.json
next to the trophy icon. For security reasons, you cannot download the file again once uploaded, you can only remove or upload a new file.
Populating a Dataset from Kaggle
Click the Datasets
option on the sidebar and click the Create
button to open the new dataset form. Enter a name for the new dataset in the Name
field and select Kaggle
from the Source Type
dropdown. From the Type
field, select Competition
if you are downloading the data for a competition you have entered or Dataset
if you are downloading other public or personal datasets.
You can only download competition datasets if you have already read and accepted the rules through the Kaggle website
For the Path
field, you must specify the short name Kaggle uses the competition or the datasets. The two easiest ways to find this short name are:
- The URL path of the competition or dataset you wish to download. For example, if you are viewing this dataset on the 2020 US Election in your web browser, the URL in your address bar is
https://www.kaggle.com/unanimad/us-election-2020
. If you want to download this dataset into trainML, specifyunanimad/us-election-2020
in thePath
field, specifically, the URL component afterwww.kaggle.com/
. If you are viewing the Mechanisms of Action competition in your web browser, the URL in your address bar ishttps://www.kaggle.com/c/lish-moa
. If you want to download this competition's data into trainML, specifylish-moa
in thePath
field, specifically, the URL component afterwww.kaggle.com/c/
- Viewing the API command from the Kaggle web interface. For datasets, if you click the triple dot button on the far right side of the Kaggle Dataset menu bar, next to the
New Notebook
button, there is aCopy API command
button. If you click this for this dataset on the 2020 US Election, it will copykaggle datasets download -d unanimad/us-election-2020
into your clipboard. If you want to download this dataset into trainML, specifyunanimad/us-election-2020
in thePath
field, specifically, the command component afterdownload -d
. For a competition, if you click theData
tab on the Kaggle Competition menu bar, right above theData Explorer
, it will list the API command to download the datasets. If you are viewing the Mechanisms of Action competition, you will seekaggle competitions download -c lish-moa
. If you want to download this competition's data into trainML, specifylish-moa
in thePath
field, specifically, the command afterdownload -c
Click Create
to submit the form. The trainML platform will automatically download the dataset and extract it for use on subsequent jobs. Once the dataset is Ready
, you can attach it to any number of notebooks or training jobs concurrently.
Adding Kaggle Credentials to a Notebook
To give a notebook or training job access to your Kaggle account, select the Kaggle
option from the Third-Party Keys
field in the Environment section of the job form. Once the job is started, the kaggle CLI will be automatically configured to utilize these credentials. Additional instructions for interacting with Kaggle competitions using the kaggle CLI can be found here.