{"cells": [{"cell_type": "markdown", "id": "7a72eb68", "metadata": {}, "source": ["\n", "# TPU training with PyTorch Lightning\n", "\n", "* **Author:** PL team\n", "* **License:** CC BY-SA\n", "* **Generated:** 2023-03-15T10:55:06.147601\n", "\n", "In this notebook, we'll train a model on TPUs. Updating one Trainer flag is all you need for that. The most up to documentation related to TPU training can be found [here](https://lightning.ai/docs/pytorch/stable/accelerators/tpu.html).\n", "\n", "---\n", "Open in [![Open In Colab](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAHUAAAAUCAYAAACzrHJDAAAIuUlEQVRoQ+1ZaVRURxb+qhdolmbTUVSURpZgmLhHbQVFZIlGQBEXcMvJhKiTEzfigjQg7oNEJ9GMGidnjnNMBs2czIzajksEFRE1xklCTKJiQLRFsUGkoUWw+82pamn79etGYoKek1B/4NW99/tu3e/dquJBAGD27NkHALxKf39WY39gyrOi+i3xqGtUoePJrFmznrmgtModorbTu8YRNZk5cybXTvCtwh7o6NR2KzuZMWNGh6jtVt7nA0ymT5/eJlF9POrh7PAQl6s8bGYa3PUum//htmebVtLRqW0q01M5keTk5FZFzU0oRle3+zxwg5Hgtb+PZiL/ZVohxCI+hL5JgjmfjPxZ26+33BG3dA+ealHPM4gQAo5rU59gsI8bRvl54t3Ca62mvHyUAhtOlLd5WSQpKcluBjumnoCLs1EARkVd9E8l3p9y2i7RbQ1B6pFwu/YDgW8KbHJHMTQrwnjz2oZm9M4pavOCfo5jWrgCaaMVcMs6/pNhDr0+AMN93XlxV7R6DNpyzi7W/OE+yIrsjU6rTrbKV5cd/pNyItOmTbMp6sbBB+EqaYJY4cWE3VUciNt1TpgfcRFv71Fi54xT5kSoyLvOBEJMOMxWXkFlBeBSX4u6Zkcs+3KszYRtiapbNRqF31UgetVuc8z9vBXIv1qD+F1f83B6uDlCUyfsZGepGPpmg01OB7EITQbhS9ribKy+DmP1DUiClLz4bnIHVOqa7BY+Z1wg5g3zgUvyehiNpnJKxSLc/ts76LKm0BzX3c0RNy1yXjDcB5lWoro4iNHQxM+f1kWeWQARAWQS++trISJTp061Kep25X/MycwtjuctSC5rxo7ppi7VNUox5+PhPHtrsS2O1qJ6yx1QujQUzm9sh6hbkBlvvGcN8hYnwjUjH6kjfZEd5c/jitz5Jc5U3ENnFynKl4eB7nyEgP2UZ+Yz3/rVEbyYr27qELrtC4FIC0J7sc7xWnmccdHfRRTs0VB+cA4lt+oFcRR/wUeH8FG5w2Mbx8FQ8TXEvv1xYf4wBP3O2WyL3/UVjpXWgIqaFeUPr+wTmDvUB7njH6/bOv+HRg4SqioAg5GDe1aB3ZeMTJkyRSBqkLsWqSEm0fZVBEN94zEZnYvrdx1JL5cxe+a+AbhSJecRRHW/ikTFRTa38dtQlNZ5CRKwFvUtZU/kvBoEF9Uxni/XqIM+dwKbTw3rhcxIf7gmr2M+H6SMwx8iBzJbw5oxeG3Lv5FX9B3AGaHPS8e8z77H7v9VMpvPG5ug1enh7eGK8h0LBTwUb+GInqzInlRUK65DmTPQu4c3+uQKjwKK77zwUxBX4Tq7yR1RuiwUsqlrABCM6esHdXoy47fk4+prYKy8ZF574x4V5BnHQBuf4g9Z9ld8U36L2aktZNNplNfw7zotwWTy5MkCUft4aLEopJj5/OPHl1BQqeAVOnHgNSQOqmBzq9V9cfEm/yx5ubMGKS9cYPZ3vx2OS/c6PVHUuUO7Y1Pci3BO/1zgq18byebfGemLtNF+6JRtOvMk926ibussZqM+1mNz4TWkH7rCbM5phwGRGDAaoF8fY5OHFnlldAA8sgoEXKnDukA1NgSeNjqkJT9brbN4pC9WRweYXyLugR73c+MYvyWfu0yC6+mjzN1Isfw3FKJS98CU/zI1IHFkFPR52cHL2FJk0sB6kMTERIGo9GzcPkLNfA0cwdwi/hfEYO86ZMd9w+y1egfM2T2Eh/vesMNwljSzuZRT420SW3eqy8N6aHMmwmnFUZ7/PGVPbIoNZvNU1BURdHs0bT2+HjL8sDSM2e6vi4Lj5NW8WOLVA6RTT2azxLV+bglaFNqLieqemS/gWkw7NyoAHo+2dEsiivengjKsPFoqWOvbSh/kxPaxyW/JRzH2Fl3EzD9/xjAefJqB3usKUFn/0Gb+S/d/jy3FN2yLOmnSJJtn6oehByEiHPSeXnDxFGPRnoFoaBJjcdQlbDwcjL1zTNuQpoxD7R0OG0uUTMi0fkVwdzBdYIwcwZunxrVJVLplNm54BZp7jfDfYLoNyqQi1K6KxIdHzmN+QQ2WjFIwUT2zTGdlRXo4NFXVUO4sgX5dFC7f0aP/ZlNeUjFBuL8Xjl6uRuP6aMjSjpjzsH62FDU7JhBuGccEXIvDfJFFBc/gHw80dklfCVYnRaDfpiJcutPA4F7qJsfJeUPQI+1fqMlNhFx1FM0GDqkjFVg7NojlQ0Vt4aM5ReSqcbpaCg8nCW5lRsBvbT4T1TLfFptsfh7gItzuKTdJSEiwKSrt1vcmnEXXrsLbYnWDA1bu+z2WKy9Arq+1KRqdfKsoBo0GcdtEpS/B1bO4v0cFiUhkjskvKcMrWwtAPHuwQq8Z+4LZ1vTQANfXt4J0DwZX9gWa9qh4XDM/voC9JXfwYEMMHJcfNtusn82ihvliVUwg5KrPGVf6GH94ZJpEZBen6EC4qYTHA1dXhW0JIex8txzv//c8lhzXIi/BFxOH9jGbQhZsRalTIBZZ8KkGyZAxeRQvXkFF1TWz/Hm46jNYUnjPbt3JxIkT7f6dSj8qfJJyVvBxgaIlblOyjtysNHWN9fjjqWi7glJfW3/S0Hlj2XnA8PhKT9w6g3Qx3XiXhvuxQsuT1proxBKI/AaZqY1Xz5muvY8G8XkRRCaHsfQsRAFDH/tZPbcYuHotOG0FRIqB4HR3wNVoIPLtz8ycTguu+jpEigE218vd1YCr5m+HpHMvEI9u4LTXwNWaLjl0iPwGAmIpeHx1VeCqTJdPs1/vweweQPO3HC24NhOhnTphwoQnfv6QSY2ICbkNmdSA4h87oaLaiYfn5diIEd4att2erOwJXbPUHp953p6orQVSUVWRAXBT8c/dJ5L9xhzaJGp71GR/wFP8P5V2z10NSC9T93QM2xUg8fHxT+zU9ijeU4naHon8CjFJXFzc8/kn+dN06q9QgF98SYSo2Xen2NjYZy5sR6f+4nLSK5Iam2PH/x87a1YN/t5sBgAAAABJRU5ErkJggg==){height=\"20px\" width=\"117px\"}](https://colab.research.google.com/github/PytorchLightning/lightning-tutorials/blob/publication/.notebooks/lightning_examples/mnist-tpu-training.ipynb)\n", "\n", "Give us a \u2b50 [on Github](https://www.github.com/Lightning-AI/lightning/)\n", "| Check out [the documentation](https://pytorch-lightning.readthedocs.io/en/stable/)\n", "| Join us [on Slack](https://www.pytorchlightning.ai/community)"]}, {"cell_type": "markdown", "id": "ff515d3c", "metadata": {}, "source": ["## Setup\n", "This notebook requires some packages besides pytorch-lightning."]}, {"cell_type": "code", "execution_count": null, "id": "b7b3bc76", "metadata": {"colab": {}, "colab_type": "code", "id": "LfrJLKPFyhsK", "lines_to_next_cell": 0}, "outputs": [], "source": ["! pip install --quiet \"ipython[notebook]>=8.0.0, <8.12.0\" \"lightning>=2.0.0rc0\" \"setuptools==67.4.0\" \"torch>=1.8.1, <1.14.0\" \"torchvision\" \"pytorch-lightning>=1.4, <2.0.0\" \"torchmetrics>=0.7, <0.12\""]}, {"cell_type": "markdown", "id": "1908173f", "metadata": {}, "source": ["### Install Colab TPU compatible PyTorch/TPU wheels and dependencies"]}, {"cell_type": "code", "execution_count": null, "id": "b1565420", "metadata": {}, "outputs": [], "source": ["! pip install cloud-tpu-client==0.10 https://storage.googleapis.com/tpu-pytorch/wheels/torch_xla-1.8-cp37-cp37m-linux_x86_64.whl\n", "\n", "import lightning as L"]}, {"cell_type": "code", "execution_count": null, "id": "7f228c10", "metadata": {}, "outputs": [], "source": ["import torch\n", "import torch.nn.functional as F\n", "from torch import nn\n", "from torch.utils.data import DataLoader, random_split\n", "from torchmetrics.functional import accuracy\n", "from torchvision import transforms\n", "\n", "# Note - you must have torchvision installed for this example\n", "from torchvision.datasets import MNIST\n", "\n", "BATCH_SIZE = 1024"]}, {"cell_type": "markdown", "id": "c80ce4df", "metadata": {"lines_to_next_cell": 2}, "source": ["### Defining The `MNISTDataModule`\n", "\n", "Below we define `MNISTDataModule`. You can learn more about datamodules\n", "in [docs](https://lightning.ai/docs/pytorch/stable/data/datamodule.html)."]}, {"cell_type": "code", "execution_count": null, "id": "bfb8c16c", "metadata": {"lines_to_next_cell": 2}, "outputs": [], "source": ["class MNISTDataModule(L.LightningDataModule):\n", " def __init__(self, data_dir: str = \"./\"):\n", " super().__init__()\n", " self.data_dir = data_dir\n", " self.transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])\n", "\n", " self.dims = (1, 28, 28)\n", " self.num_classes = 10\n", "\n", " def prepare_data(self):\n", " # download\n", " MNIST(self.data_dir, train=True, download=True)\n", " MNIST(self.data_dir, train=False, download=True)\n", "\n", " def setup(self, stage=None):\n", " # Assign train/val datasets for use in dataloaders\n", " if stage == \"fit\" or stage is None:\n", " mnist_full = MNIST(self.data_dir, train=True, transform=self.transform)\n", " self.mnist_train, self.mnist_val = random_split(mnist_full, [55000, 5000])\n", "\n", " # Assign test dataset for use in dataloader(s)\n", " if stage == \"test\" or stage is None:\n", " self.mnist_test = MNIST(self.data_dir, train=False, transform=self.transform)\n", "\n", " def train_dataloader(self):\n", " return DataLoader(self.mnist_train, batch_size=BATCH_SIZE)\n", "\n", " def val_dataloader(self):\n", " return DataLoader(self.mnist_val, batch_size=BATCH_SIZE)\n", "\n", " def test_dataloader(self):\n", " return DataLoader(self.mnist_test, batch_size=BATCH_SIZE)"]}, {"cell_type": "markdown", "id": "ace762e3", "metadata": {"lines_to_next_cell": 2}, "source": ["### Defining the `LitModel`\n", "\n", "Below, we define the model `LitMNIST`."]}, {"cell_type": "code", "execution_count": null, "id": "ece48107", "metadata": {}, "outputs": [], "source": ["class LitModel(L.LightningModule):\n", " def __init__(self, channels, width, height, num_classes, hidden_size=64, learning_rate=2e-4):\n", " super().__init__()\n", "\n", " self.save_hyperparameters()\n", "\n", " self.model = nn.Sequential(\n", " nn.Flatten(),\n", " nn.Linear(channels * width * height, hidden_size),\n", " nn.ReLU(),\n", " nn.Dropout(0.1),\n", " nn.Linear(hidden_size, hidden_size),\n", " nn.ReLU(),\n", " nn.Dropout(0.1),\n", " nn.Linear(hidden_size, num_classes),\n", " )\n", "\n", " def forward(self, x):\n", " x = self.model(x)\n", " return F.log_softmax(x, dim=1)\n", "\n", " def training_step(self, batch, batch_idx):\n", " x, y = batch\n", " logits = self(x)\n", " loss = F.nll_loss(logits, y)\n", " self.log(\"train_loss\", loss)\n", " return loss\n", "\n", " def validation_step(self, batch, batch_idx):\n", " x, y = batch\n", " logits = self(x)\n", " loss = F.nll_loss(logits, y)\n", " preds = torch.argmax(logits, dim=1)\n", " acc = accuracy(preds, y)\n", " self.log(\"val_loss\", loss, prog_bar=True)\n", " self.log(\"val_acc\", acc, prog_bar=True)\n", "\n", " def configure_optimizers(self):\n", " optimizer = torch.optim.Adam(self.parameters(), lr=self.hparams.learning_rate)\n", " return optimizer"]}, {"cell_type": "markdown", "id": "8065ea86", "metadata": {}, "source": ["### TPU Training\n", "\n", "Lightning supports training on a single TPU core or 8 TPU cores.\n", "\n", "The Trainer parameter `devices` defines how many TPU cores to train on (1 or 8) / Single TPU core to train on [1]\n", "along with accelerator='tpu'.\n", "\n", "For Single TPU training, Just pass the TPU core ID [1-8] in a list.\n", "Setting `devices=[5]` will train on TPU core ID 5."]}, {"cell_type": "markdown", "id": "437103c4", "metadata": {}, "source": ["Train on TPU core ID 5 with `devices=[5]`."]}, {"cell_type": "code", "execution_count": null, "id": "5c6e4868", "metadata": {}, "outputs": [], "source": ["# Init DataModule\n", "dm = MNISTDataModule()\n", "# Init model from datamodule's attributes\n", "model = LitModel(*dm.size(), dm.num_classes)\n", "# Init trainer\n", "trainer = L.Trainer(\n", " max_epochs=3,\n", " accelerator=\"tpu\",\n", " devices=[5],\n", ")\n", "# Train\n", "trainer.fit(model, dm)"]}, {"cell_type": "markdown", "id": "0ea0a0e1", "metadata": {}, "source": ["Train on single TPU core with `devices=1`."]}, {"cell_type": "code", "execution_count": null, "id": "b53b3b86", "metadata": {}, "outputs": [], "source": ["# Init DataModule\n", "dm = MNISTDataModule()\n", "# Init model from datamodule's attributes\n", "model = LitModel(*dm.dims, dm.num_classes)\n", "# Init trainer\n", "trainer = L.Trainer(\n", " max_epochs=3,\n", " accelerator=\"tpu\",\n", " devices=1,\n", ")\n", "# Train\n", "trainer.fit(model, dm)"]}, {"cell_type": "markdown", "id": "5f921d80", "metadata": {}, "source": ["Train on 8 TPU cores with `accelerator='tpu'` and `devices=8`.\n", "You might have to restart the notebook to run it on 8 TPU cores after training on single TPU core."]}, {"cell_type": "code", "execution_count": null, "id": "6eb627e7", "metadata": {}, "outputs": [], "source": ["# Init DataModule\n", "dm = MNISTDataModule()\n", "# Init model from datamodule's attributes\n", "model = LitModel(*dm.dims, dm.num_classes)\n", "# Init trainer\n", "trainer = L.Trainer(\n", " max_epochs=3,\n", " accelerator=\"tpu\",\n", " devices=8,\n", ")\n", "# Train\n", "trainer.fit(model, dm)"]}, {"cell_type": "markdown", "id": "0875f661", "metadata": {}, "source": ["## Congratulations - Time to Join the Community!\n", "\n", "Congratulations on completing this notebook tutorial! If you enjoyed this and would like to join the Lightning\n", "movement, you can do so in the following ways!\n", "\n", "### Star [Lightning](https://github.com/Lightning-AI/lightning) on GitHub\n", "The easiest way to help our community is just by starring the GitHub repos! This helps raise awareness of the cool\n", "tools we're building.\n", "\n", "### Join our [Slack](https://www.pytorchlightning.ai/community)!\n", "The best way to keep up to date on the latest advancements is to join our community! Make sure to introduce yourself\n", "and share your interests in `#general` channel\n", "\n", "\n", "### Contributions !\n", "The best way to contribute to our community is to become a code contributor! At any time you can go to\n", "[Lightning](https://github.com/Lightning-AI/lightning) or [Bolt](https://github.com/Lightning-AI/lightning-bolts)\n", "GitHub Issues page and filter for \"good first issue\".\n", "\n", "* [Lightning good first issue](https://github.com/Lightning-AI/lightning/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22)\n", "* [Bolt good first issue](https://github.com/Lightning-AI/lightning-bolts/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22)\n", "* You can also contribute your own notebooks with useful examples !\n", "\n", "### Great thanks from the entire Pytorch Lightning Team for your interest !\n", "\n", "[![Pytorch Lightning](data:image/png;base64,NDA0OiBOb3QgRm91bmQ=){height=\"60px\" width=\"240px\"}](https://pytorchlightning.ai)"]}, {"cell_type": "raw", "metadata": {"raw_mimetype": "text/restructuredtext"}, "source": [".. customcarditem::\n", " :header: TPU training with PyTorch Lightning\n", " :card_description: In this notebook, we'll train a model on TPUs. Updating one Trainer flag is all you need for that. The most up to documentation related to TPU training can be found...\n", " :tags: Image,GPU/TPU,Lightning-Examples"]}], "metadata": {"jupytext": {"cell_metadata_filter": "colab_type,colab,id,-all", "formats": "ipynb,py:percent", "main_language": "python"}}, "nbformat": 4, "nbformat_minor": 5}