Customize your Cloud Compute

Audience: Users who want to select the hardware to run in the cloud.

Level: Intermediate


Customize my Work resources

In the cloud, you can simply configure which machine to run on by passing a CloudCompute to your work __init__ method:

import lightning as L

# Run on a free, shared CPU machine. This is the default for every LightningWork.
MyCustomWork(cloud_compute=L.CloudCompute())

# Run on a dedicated, medium-size CPU machine (see specs below)
MyCustomWork(cloud_compute=L.CloudCompute("cpu-medium"))

# Run on cheap GPU machine with a single GPU (see specs below)
MyCustomWork(cloud_compute=L.CloudCompute("gpu"))

# Run on a fast multi-GPU machine (see specs below)
MyCustomWork(cloud_compute=L.CloudCompute("gpu-fast-multi"))

Warning

Custom base images are not supported with the default CPU cloud compute. For example:

class MyWork(LightningWork):
    def __init__(self):
     super().__init__(cloud_build_config=BuildConfig(image="my-custom-image")) # no cloud compute, for example default work

Here is the full list of supported machine names:

Hardware by Accelerator Type

Name

# of CPUs

GPUs

Memory

default

1

0

4 GB

cpu-small

2

0

8 GB

cpu-medium

8

0

32 GB

gpu

4

1 (T4, 16 GB)

16 GB

gpu-fast

8

1 (V100, 16 GB)

61 GB

gpu-fast-multi

32

4 (V100 16 GB)

244 GB

The up-to-date prices for these instances can be found here.


Stop my work when idle

By providing idle_timeout=X Seconds, the work is automatically stopped X seconds after doing nothing.

import lightning as L

# Run on a single CPU and turn down immediately when idle.
MyCustomWork(cloud_compute=L.CloudCompute("gpu", idle_timeout=0))

CloudCompute

class lightning.app.utilities.packaging.cloud_compute.CloudCompute(name='default', disk_size=0, idle_timeout=None, shm_size=None, mounts=None, colocation_group_id=None, interruptible=False, _internal_id=None)[source]

Bases: object

Configure the cloud runtime for a lightning work or flow.

Parameters:
  • name (str) – The name of the hardware to use. A full list of supported options can be found in Customize your Cloud Compute. If you have a request for more hardware options, please contact onprem@lightning.ai.

  • disk_size (int) – The disk size in Gigabytes. The value you set here will be allocated to the /home folder.

  • idle_timeout (Optional[int]) – The number of seconds to wait before pausing the compute when the work is running and idle. This timeout starts whenever your run() method succeeds (or fails). If the timeout is reached, the instance pauses until the next run() call happens.

  • shm_size (Optional[int]) – Shared memory size in MiB, backed by RAM. min 512, max 8192, it will auto update in steps of 512. For example 1100 will become 1024. If set to zero (the default) will get the default 64MiB inside docker.

  • mounts (Union[Mount, List[Mount], None]) – External data sources which should be mounted into a work as a filesystem at runtime.

  • colocation_group_id (Optional[str]) – Identifier for groups of works to be colocated in the same datacenter. Set this to a string of max. 64 characters and all works with this group id will run in the same datacenter. If not set, the works are not guaranteed to be colocated.

  • interruptible (bool) – Whether to run on a interruptible machine e.g the machine can be stopped at any time by the providers. This is also known as spot or preemptible machines. Compared to on-demand machines, they tend to be cheaper.