Train on the cloud (basic)¶
Audience: Anyone looking to train across many machines at once on the cloud.
Why do I need cloud training?¶
Training on the cloud is a cost effective way to train your models faster by allowing you to access powerful GPU machines.
For example, if your model takes 10 days to train on a CPU machine, here’s how cloud training can speed up your training time:
Machine type |
Training time |
Cost (AWS 1 M60 GPU) |
---|---|---|
CPU |
10 days |
$12.00 |
1 GPU |
2 days |
$11.52 |
2 GPU |
1 day |
$20.64 |
4 GPU |
12 hours |
$19.08 |
Start a cloud machine in < 1 minute¶
Lightning has a native cloud solution with various products (lightning-grid) designed for researchers and ML practicioners in industry. To start an interactive machine simply go to Lightning Grid to create a free account, then start a new Grid Session.
A Grid Session is an interactive machine with 1-16 GPUs per machine.
Open the Jupyter Notebook¶
Once the Session starts, open a Jupyter notebook.
Clone and run your model¶
On the Jupyter page you can use a Notebook, or to clone your code and run via the CLI.
Cost¶
Lightning (via lightning-grid) provides access to cloud machines to the community for free. However, you must buy credits on lightning-grid which are used to pay the cloud providers on your behalf.
If you want to run on your own AWS account and pay the cloud provider directly, please contact our onprem team: mailto:onprem@pytorchlightning.ai
Next Steps¶
Here are the recommended next steps depending on your workflow.