Skip to main content

Multiple GPU Type Job Specifications

· 2 min read

trainML now provides unparalleled job flexibility by allowing jobs to specify multiple GPU types that can satisfy a job request. The trainML job scheduler will automatically select the most affordable GPU type available.

How It Works

If a job can be satisfied by multiple GPU models, you can now specify all of them as in the job specification. The job scheduler then filters the available GPUs by both the selected GPU types and the Max Price setting to determine which resources are available. If multiple GPU types match, it selects the least costly type to schedule job execution.

This can be particularly useful when you have CloudBender compute nodes of different GPU types, but any can be used to satisfy a given job request. For example, if you have both V100 and A100 compute notes, you can add both V100 and A100 gpu types to the job specification. Since V100s are cheaper than A100s, if a V100 is available, it will select it. If no V100s are available, it will automatically run the job on an available A100. If neither are available, the job will wait until one becomes available and start on whichever type becomes available first.

Using the Web Platform

From the job form, click the GPU card for as many GPU Types as can satisfy your workload. Be sure to set the Max Price setting to a value that reflects the maximum amount per GPU you wish to pay for the job. Click Next to review your selection and Create to start the job.

Once the job passes the Waiting for GPUs status, you can see the GPU Type that was selected by clicking the job name on the dashboard and viewing the details.

Using the SDK

To specify multiple GPU Types in the SDK, provide an array as the gpu_types parameter in the job create function.

job = await trainml.jobs.create(
name="Multi GPU Type Selection",
type="inference",
gpu_types=["rtx2080ti", "rtx3090", "v100", "a100"],
gpu_count=1,
max_price=10,
disk_size=10,
workers=[
"python predict.py",
],
...
)

Using the CLI

To specify multple GPU types in the CLI, use the --gpu-type option multiple times:

trainml job create notebook \
--gpu-type rtx2080ti --gpu-type rtx3090 --gpu-type v100 --max-price 5 \
"Multi GPU Type Notebook"