{"cells": [{"cell_type": "markdown", "id": "4b017e6e", "metadata": {"papermill": {"duration": 0.524855, "end_time": "2023-01-03T14:11:11.828462", "exception": false, "start_time": "2023-01-03T14:11:11.303607", "status": "completed"}, "tags": []}, "source": ["\n", "# Barlow Twins Tutorial\n", "\n", "* **Author:** Ananya Harsh Jha (ananya@pytorchlightning.ai)\n", "* **License:** CC BY-SA\n", "* **Generated:** 2023-01-03T15:09:26.174192\n", "\n", "This notebook describes the self-supervised learning method Barlow Twins.\n", "Barlow Twins differs from other recently proposed algorithms as it doesn't\n", "fall under the category of either contrastive learning, or methods like knowledge\n", "distillation or clustering. The simplicity of the loss function and its effectiveness\n", "in comparison to the current state of the art makes Barlow Twins an interesting\n", "case study.\n", "\n", "\n", "---\n", "Open in [![Open In Colab](){height=\"20px\" width=\"117px\"}](https://colab.research.google.com/github/PytorchLightning/lightning-tutorials/blob/publication/.notebooks/lightning_examples/barlow-twins.ipynb)\n", "\n", "Give us a \u2b50 [on Github](https://www.github.com/Lightning-AI/lightning/)\n", "| Check out [the documentation](https://pytorch-lightning.readthedocs.io/en/stable/)\n", "| Join us [on Slack](https://www.pytorchlightning.ai/community)"]}, {"cell_type": "markdown", "id": "9e7d262c", "metadata": {"papermill": {"duration": 0.023525, "end_time": "2023-01-03T14:11:11.869700", "exception": false, "start_time": "2023-01-03T14:11:11.846175", "status": "completed"}, "tags": []}, "source": ["## Setup\n", "This notebook requires some packages besides pytorch-lightning."]}, {"cell_type": "code", "execution_count": 1, "id": "be4928e0", "metadata": {"colab": {}, "colab_type": "code", "execution": {"iopub.execute_input": "2023-01-03T14:11:11.913461Z", "iopub.status.busy": "2023-01-03T14:11:11.912934Z", "iopub.status.idle": "2023-01-03T14:11:16.941423Z", "shell.execute_reply": "2023-01-03T14:11:16.940495Z"}, "id": "LfrJLKPFyhsK", "lines_to_next_cell": 0, "papermill": {"duration": 5.056331, "end_time": "2023-01-03T14:11:16.943449", "exception": false, "start_time": "2023-01-03T14:11:11.887118", "status": "completed"}, "tags": []}, "outputs": [], "source": ["! pip install --quiet \"setuptools==59.5.0\" \"matplotlib\" \"ipython[notebook]\" \"torch>=1.8\" \"torchvision\" \"torchmetrics>=0.7\" \"pytorch-lightning>=1.4\""]}, {"cell_type": "markdown", "id": "0c1031c1", "metadata": {"papermill": {"duration": 0.041149, "end_time": "2023-01-03T14:11:17.031116", "exception": false, "start_time": "2023-01-03T14:11:16.989967", "status": "completed"}, "tags": []}, "source": ["## Barlow Twins\n", "\n", "Barlow Twins finds itself in unique place amongst the current state-of-the-art self-supervised learning methods. It does not fall under the existing categories of contrastive learning, knowledge distillation or clustering based methods. Instead, it creates its own category of redundancy reductionand achieves competitive performance with a simple yet effective loss function. In this tutorial, we look at coding up a small version of Barlow Twins algorithm using PyTorch Lightning."]}, {"cell_type": "code", "execution_count": 2, "id": "b8260244", "metadata": {"execution": {"iopub.execute_input": "2023-01-03T14:11:17.129802Z", "iopub.status.busy": "2023-01-03T14:11:17.129190Z", "iopub.status.idle": "2023-01-03T14:11:20.647421Z", "shell.execute_reply": "2023-01-03T14:11:20.646552Z"}, "papermill": {"duration": 3.551425, "end_time": "2023-01-03T14:11:20.649443", "exception": false, "start_time": "2023-01-03T14:11:17.098018", "status": "completed"}, "tags": []}, "outputs": [], "source": ["from functools import partial\n", "from typing import Sequence, Tuple, Union\n", "\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pytorch_lightning as pl\n", "import torch\n", "import torch.nn as nn\n", "import torch.nn.functional as F\n", "import torchvision.transforms as transforms\n", "import torchvision.transforms.functional as VisionF\n", "from pytorch_lightning import Callback, LightningModule, Trainer\n", "from pytorch_lightning.callbacks import ModelCheckpoint\n", "from torch import Tensor\n", "from torch.utils.data import DataLoader\n", "from torchmetrics.functional import accuracy\n", "from torchvision.datasets import CIFAR10\n", "from torchvision.models.resnet import resnet18\n", "from torchvision.utils import make_grid\n", "\n", "batch_size = 32\n", "num_workers = 0  # to run notebook on CPU\n", "max_epochs = 200\n", "z_dim = 128"]}, {"cell_type": "markdown", "id": "4bbc2b02", "metadata": {"papermill": {"duration": 0.003483, "end_time": "2023-01-03T14:11:20.717365", "exception": false, "start_time": "2023-01-03T14:11:20.713882", "status": "completed"}, "tags": []}, "source": ["### Transforms\n", "\n", "We first define the data augmentation pipeline used in Barlow Twins. Here, we use pipeline proposed in SimCLR, which generates two copies/views of an input image by applying the following transformations in a sequence.\n", "\n", "First it takes a random crop of the image and resizes it to a fixed pre-specified size. Then, it applies a left-to-right random flip with a probability of 0.5. This step is followed by a composition of color jitter, conversion to grayscale with a probability of 0.2 and the application of a Gaussian blur filter. Finally, we normalize the image and convert it to a tensor.\n", "\n", "Within this transform, we add a third view for our online finetuner, which we explain later on. But, to explain things quickly here, we add a another transform to perform perform test our encoder on a downstream classification task."]}, {"cell_type": "code", "execution_count": 3, "id": "e1bc9fac", "metadata": {"execution": {"iopub.execute_input": "2023-01-03T14:11:21.180488Z", "iopub.status.busy": "2023-01-03T14:11:21.179699Z", "iopub.status.idle": "2023-01-03T14:11:21.189128Z", "shell.execute_reply": "2023-01-03T14:11:21.188013Z"}, "papermill": {"duration": 0.373056, "end_time": "2023-01-03T14:11:21.191057", "exception": false, "start_time": "2023-01-03T14:11:20.818001", "status": "completed"}, "tags": []}, "outputs": [], "source": ["class BarlowTwinsTransform:\n", "    def __init__(self, train=True, input_height=224, gaussian_blur=True, jitter_strength=1.0, normalize=None):\n", "        self.input_height = input_height\n", "        self.gaussian_blur = gaussian_blur\n", "        self.jitter_strength = jitter_strength\n", "        self.normalize = normalize\n", "        self.train = train\n", "\n", "        color_jitter = transforms.ColorJitter(\n", "            0.8 * self.jitter_strength,\n", "            0.8 * self.jitter_strength,\n", "            0.8 * self.jitter_strength,\n", "            0.2 * self.jitter_strength,\n", "        )\n", "\n", "        color_transform = [transforms.RandomApply([color_jitter], p=0.8), transforms.RandomGrayscale(p=0.2)]\n", "\n", "        if self.gaussian_blur:\n", "            kernel_size = int(0.1 * self.input_height)\n", "            if kernel_size % 2 == 0:\n", "                kernel_size += 1\n", "\n", "            color_transform.append(transforms.RandomApply([transforms.GaussianBlur(kernel_size=kernel_size)], p=0.5))\n", "\n", "        self.color_transform = transforms.Compose(color_transform)\n", "\n", "        if normalize is None:\n", "            self.final_transform = transforms.ToTensor()\n", "        else:\n", "            self.final_transform = transforms.Compose([transforms.ToTensor(), normalize])\n", "\n", "        self.transform = transforms.Compose(\n", "            [\n", "                transforms.RandomResizedCrop(self.input_height),\n", "                transforms.RandomHorizontalFlip(p=0.5),\n", "                self.color_transform,\n", "                self.final_transform,\n", "            ]\n", "        )\n", "\n", "        self.finetune_transform = None\n", "        if self.train:\n", "            self.finetune_transform = transforms.Compose(\n", "                [\n", "                    transforms.RandomCrop(32, padding=4, padding_mode=\"reflect\"),\n", "                    transforms.RandomHorizontalFlip(),\n", "                    transforms.ToTensor(),\n", "                ]\n", "            )\n", "        else:\n", "            self.finetune_transform = transforms.ToTensor()\n", "\n", "    def __call__(self, sample):\n", "        return self.transform(sample), self.transform(sample), self.finetune_transform(sample)"]}, {"cell_type": "markdown", "id": "73b8f50f", "metadata": {"papermill": {"duration": 0.068577, "end_time": "2023-01-03T14:11:21.837189", "exception": false, "start_time": "2023-01-03T14:11:21.768612", "status": "completed"}, "tags": []}, "source": ["### Dataset\n", "\n", "We select CIFAR10 as the dataset to demonstrate the pre-training process for Barlow Twins. CIFAR10 images are 32x32 in size and we do not apply a Gaussian blur transformation on them. In this step, we create the training and validation dataloaders for CIFAR10."]}, {"cell_type": "code", "execution_count": 4, "id": "c5203e71", "metadata": {"execution": {"iopub.execute_input": "2023-01-03T14:11:21.892168Z", "iopub.status.busy": "2023-01-03T14:11:21.891508Z", "iopub.status.idle": "2023-01-03T14:11:30.340106Z", "shell.execute_reply": "2023-01-03T14:11:30.339349Z"}, "papermill": {"duration": 8.468487, "end_time": "2023-01-03T14:11:30.342113", "exception": false, "start_time": "2023-01-03T14:11:21.873626", "status": "completed"}, "tags": []}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./cifar-10-python.tar.gz\n"]}, {"data": {"application/vnd.jupyter.widget-view+json": {"model_id": "9c7080f0c55149c584aca8c54558da86", "version_major": 2, "version_minor": 0}, "text/plain": ["  0%|          | 0/170498071 [00:00<?, ?it/s]"]}, "metadata": {}, "output_type": "display_data"}, {"name": "stdout", "output_type": "stream", "text": ["Extracting ./cifar-10-python.tar.gz to .\n"]}, {"name": "stdout", "output_type": "stream", "text": ["Files already downloaded and verified\n"]}], "source": ["def cifar10_normalization():\n", "    normalize = transforms.Normalize(\n", "        mean=[x / 255.0 for x in [125.3, 123.0, 113.9]], std=[x / 255.0 for x in [63.0, 62.1, 66.7]]\n", "    )\n", "    return normalize\n", "\n", "\n", "train_transform = BarlowTwinsTransform(\n", "    train=True, input_height=32, gaussian_blur=False, jitter_strength=0.5, normalize=cifar10_normalization()\n", ")\n", "train_dataset = CIFAR10(root=\".\", train=True, download=True, transform=train_transform)\n", "\n", "val_transform = BarlowTwinsTransform(\n", "    train=False, input_height=32, gaussian_blur=False, jitter_strength=0.5, normalize=cifar10_normalization()\n", ")\n", "val_dataset = CIFAR10(root=\".\", train=False, download=True, transform=train_transform)\n", "\n", "train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=num_workers, drop_last=True)\n", "val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False, num_workers=num_workers, drop_last=True)"]}, {"cell_type": "markdown", "id": "3a224440", "metadata": {"papermill": {"duration": 0.032902, "end_time": "2023-01-03T14:11:30.528870", "exception": false, "start_time": "2023-01-03T14:11:30.495968", "status": "completed"}, "tags": []}, "source": ["### Plot images\n", "\n", "To see how the CIFAR10 images look after the data augmentation pipeline, we load a few images from the dataloader and plot them here."]}, {"cell_type": "code", "execution_count": 5, "id": "2dc18b5c", "metadata": {"execution": {"iopub.execute_input": "2023-01-03T14:11:30.590427Z", "iopub.status.busy": "2023-01-03T14:11:30.589726Z", "iopub.status.idle": "2023-01-03T14:11:30.851179Z", "shell.execute_reply": "2023-01-03T14:11:30.850420Z"}, "papermill": {"duration": 0.3003, "end_time": "2023-01-03T14:11:30.854031", "exception": false, "start_time": "2023-01-03T14:11:30.553731", "status": "completed"}, "tags": []}, "outputs": [{"data": {"image/png": "\n", "text/plain": ["<Figure size 640x480 with 1 Axes>"]}, "metadata": {}, "output_type": "display_data"}], "source": ["for batch in val_loader:\n", "    (img1, img2, _), label = batch\n", "    break\n", "\n", "img_grid = make_grid(img1, normalize=True)\n", "\n", "\n", "def show(imgs):\n", "    if not isinstance(imgs, list):\n", "        imgs = [imgs]\n", "    fix, axs = plt.subplots(ncols=len(imgs), squeeze=False)\n", "    for i, img in enumerate(imgs):\n", "        img = img.detach()\n", "        img = VisionF.to_pil_image(img)\n", "        axs[0, i].imshow(np.asarray(img))\n", "        axs[0, i].set(xticklabels=[], yticklabels=[], xticks=[], yticks=[])\n", "\n", "\n", "show(img_grid)"]}, {"cell_type": "markdown", "id": "4dd3612a", "metadata": {"papermill": {"duration": 0.104, "end_time": "2023-01-03T14:11:31.013943", "exception": false, "start_time": "2023-01-03T14:11:30.909943", "status": "completed"}, "tags": []}, "source": ["### Barlow Twins Loss\n", "\n", "Here we define the loss function for Barlow Twins. It first normalizes the D dimensinonal vectors from the projection head and then computes the DxD cross-correlation matrix between the normalized vectors of the 2 views of each image.\n", "\n", "Then it splits this cross-correlation matrix into two parts. The first part, the diagonal of this matrix is brought closer to 1, which pushes up the cosine similarity between the latent vectors of two views of each image, thus making the backbone invariant to the transformations applied to the views. The second part of the loss pushes the non-diagonal elements of the cross-corrlelation matrix closes to 0. This reduces the redundancy between the different dimensions of the latent vector."]}, {"cell_type": "code", "execution_count": 6, "id": "a47bdc95", "metadata": {"execution": {"iopub.execute_input": "2023-01-03T14:11:31.409899Z", "iopub.status.busy": "2023-01-03T14:11:31.408894Z", "iopub.status.idle": "2023-01-03T14:11:31.416240Z", "shell.execute_reply": "2023-01-03T14:11:31.415270Z"}, "papermill": {"duration": 0.016631, "end_time": "2023-01-03T14:11:31.417854", "exception": false, "start_time": "2023-01-03T14:11:31.401223", "status": "completed"}, "tags": []}, "outputs": [], "source": ["class BarlowTwinsLoss(nn.Module):\n", "    def __init__(self, batch_size, lambda_coeff=5e-3, z_dim=128):\n", "        super().__init__()\n", "\n", "        self.z_dim = z_dim\n", "        self.batch_size = batch_size\n", "        self.lambda_coeff = lambda_coeff\n", "\n", "    def off_diagonal_ele(self, x):\n", "        # taken from: https://github.com/facebookresearch/barlowtwins/blob/main/main.py\n", "        # return a flattened view of the off-diagonal elements of a square matrix\n", "        n, m = x.shape\n", "        assert n == m\n", "        return x.flatten()[:-1].view(n - 1, n + 1)[:, 1:].flatten()\n", "\n", "    def forward(self, z1, z2):\n", "        # N x D, where N is the batch size and D is output dim of projection head\n", "        z1_norm = (z1 - torch.mean(z1, dim=0)) / torch.std(z1, dim=0)\n", "        z2_norm = (z2 - torch.mean(z2, dim=0)) / torch.std(z2, dim=0)\n", "\n", "        cross_corr = torch.matmul(z1_norm.T, z2_norm) / self.batch_size\n", "\n", "        on_diag = torch.diagonal(cross_corr).add_(-1).pow_(2).sum()\n", "        off_diag = self.off_diagonal_ele(cross_corr).pow_(2).sum()\n", "\n", "        return on_diag + self.lambda_coeff * off_diag"]}, {"cell_type": "markdown", "id": "751b224d", "metadata": {"papermill": {"duration": 0.10387, "end_time": "2023-01-03T14:11:32.573956", "exception": false, "start_time": "2023-01-03T14:11:32.470086", "status": "completed"}, "tags": []}, "source": ["### Backbone\n", "\n", "This is a standard Resnet backbone that we pre-train using the Barlow Twins method. To accommodate the 32x32 CIFAR10 images, we replace the first 7x7 convolution of the Resnet backbone by a 3x3 filter. We also remove the first Maxpool layer from the network for CIFAR10 images."]}, {"cell_type": "code", "execution_count": 7, "id": "f2423fbc", "metadata": {"execution": {"iopub.execute_input": "2023-01-03T14:11:33.902175Z", "iopub.status.busy": "2023-01-03T14:11:33.901343Z", "iopub.status.idle": "2023-01-03T14:11:34.065025Z", "shell.execute_reply": "2023-01-03T14:11:34.064230Z"}, "papermill": {"duration": 1.486797, "end_time": "2023-01-03T14:11:34.067003", "exception": false, "start_time": "2023-01-03T14:11:32.580206", "status": "completed"}, "tags": []}, "outputs": [], "source": ["encoder = resnet18()\n", "\n", "# for CIFAR10, replace the first 7x7 conv with smaller 3x3 conv and remove the first maxpool\n", "encoder.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False)\n", "encoder.maxpool = nn.MaxPool2d(kernel_size=1, stride=1)\n", "\n", "# replace classification fc layer of Resnet to obtain representations from the backbone\n", "encoder.fc = nn.Identity()"]}, {"cell_type": "markdown", "id": "46df99c9", "metadata": {"papermill": {"duration": 0.008469, "end_time": "2023-01-03T14:11:34.109849", "exception": false, "start_time": "2023-01-03T14:11:34.101380", "status": "completed"}, "tags": []}, "source": ["### Projection head\n", "\n", "Unlike SimCLR and BYOL, the downstream performance of Barlow Twins greatly benefits from having a larger projection head after the backbone network. The paper utilizes a 3 layer MLP with 8192 hidden dimensions and 8192 as the output dimenion of the projection head. For the purposes of the tutorial, we use a smaller projection head. But, it is imperative to mention here that in practice, Barlow Twins needs to be trained using a bigger projection head as it is highly sensitive to its architecture and output dimensionality."]}, {"cell_type": "code", "execution_count": 8, "id": "0c9060b2", "metadata": {"execution": {"iopub.execute_input": "2023-01-03T14:11:34.202989Z", "iopub.status.busy": "2023-01-03T14:11:34.202663Z", "iopub.status.idle": "2023-01-03T14:11:34.207863Z", "shell.execute_reply": "2023-01-03T14:11:34.207200Z"}, "papermill": {"duration": 0.045601, "end_time": "2023-01-03T14:11:34.209453", "exception": false, "start_time": "2023-01-03T14:11:34.163852", "status": "completed"}, "tags": []}, "outputs": [], "source": ["class ProjectionHead(nn.Module):\n", "    def __init__(self, input_dim=2048, hidden_dim=2048, output_dim=128):\n", "        super().__init__()\n", "\n", "        self.projection_head = nn.Sequential(\n", "            nn.Linear(input_dim, hidden_dim, bias=True),\n", "            nn.BatchNorm1d(hidden_dim),\n", "            nn.ReLU(),\n", "            nn.Linear(hidden_dim, output_dim, bias=False),\n", "        )\n", "\n", "    def forward(self, x):\n", "        return self.projection_head(x)"]}, {"cell_type": "markdown", "id": "fe18b6c1", "metadata": {"papermill": {"duration": 0.052472, "end_time": "2023-01-03T14:11:34.281587", "exception": false, "start_time": "2023-01-03T14:11:34.229115", "status": "completed"}, "tags": []}, "source": ["### Learning rate warmup\n", "\n", "For the purposes of this tutorial, we keep things simple and use a linear warmup schedule with Adam optimizer. In our previous experiments we have found that linear warmup part is much more important for the final performance of a model than the cosine decay component of the schedule."]}, {"cell_type": "code", "execution_count": 9, "id": "8e6d5b1f", "metadata": {"execution": {"iopub.execute_input": "2023-01-03T14:11:34.424092Z", "iopub.status.busy": "2023-01-03T14:11:34.423356Z", "iopub.status.idle": "2023-01-03T14:11:34.428598Z", "shell.execute_reply": "2023-01-03T14:11:34.427900Z"}, "papermill": {"duration": 0.112423, "end_time": "2023-01-03T14:11:34.430187", "exception": false, "start_time": "2023-01-03T14:11:34.317764", "status": "completed"}, "tags": []}, "outputs": [], "source": ["def fn(warmup_steps, step):\n", "    if step < warmup_steps:\n", "        return float(step) / float(max(1, warmup_steps))\n", "    else:\n", "        return 1.0\n", "\n", "\n", "def linear_warmup_decay(warmup_steps):\n", "    return partial(fn, warmup_steps)"]}, {"cell_type": "markdown", "id": "3e56f841", "metadata": {"papermill": {"duration": 1.425889, "end_time": "2023-01-03T14:11:35.951834", "exception": false, "start_time": "2023-01-03T14:11:34.525945", "status": "completed"}, "tags": []}, "source": ["### Barlow Twins Lightning Module\n", "\n", "We keep the LightningModule for Barlow Twins neat and simple. It takes in an backbone encoder and initializes the projection head and the loss function. We configure the optimizer and the learning rate scheduler in the ``configure_optimizers`` method."]}, {"cell_type": "code", "execution_count": 10, "id": "f35509dc", "metadata": {"execution": {"iopub.execute_input": "2023-01-03T14:11:36.011264Z", "iopub.status.busy": "2023-01-03T14:11:36.010484Z", "iopub.status.idle": "2023-01-03T14:11:36.019724Z", "shell.execute_reply": "2023-01-03T14:11:36.019051Z"}, "papermill": {"duration": 0.033904, "end_time": "2023-01-03T14:11:36.021300", "exception": false, "start_time": "2023-01-03T14:11:35.987396", "status": "completed"}, "tags": []}, "outputs": [], "source": ["class BarlowTwins(LightningModule):\n", "    def __init__(\n", "        self,\n", "        encoder,\n", "        encoder_out_dim,\n", "        num_training_samples,\n", "        batch_size,\n", "        lambda_coeff=5e-3,\n", "        z_dim=128,\n", "        learning_rate=1e-4,\n", "        warmup_epochs=10,\n", "        max_epochs=200,\n", "    ):\n", "        super().__init__()\n", "\n", "        self.encoder = encoder\n", "        self.projection_head = ProjectionHead(input_dim=encoder_out_dim, hidden_dim=encoder_out_dim, output_dim=z_dim)\n", "        self.loss_fn = BarlowTwinsLoss(batch_size=batch_size, lambda_coeff=lambda_coeff, z_dim=z_dim)\n", "\n", "        self.learning_rate = learning_rate\n", "        self.warmup_epochs = warmup_epochs\n", "        self.max_epochs = max_epochs\n", "\n", "        self.train_iters_per_epoch = num_training_samples // batch_size\n", "\n", "    def forward(self, x):\n", "        return self.encoder(x)\n", "\n", "    def shared_step(self, batch):\n", "        (x1, x2, _), _ = batch\n", "\n", "        z1 = self.projection_head(self.encoder(x1))\n", "        z2 = self.projection_head(self.encoder(x2))\n", "\n", "        return self.loss_fn(z1, z2)\n", "\n", "    def training_step(self, batch, batch_idx):\n", "        loss = self.shared_step(batch)\n", "        self.log(\"train_loss\", loss, on_step=True, on_epoch=False)\n", "        return loss\n", "\n", "    def validation_step(self, batch, batch_idx):\n", "        loss = self.shared_step(batch)\n", "        self.log(\"val_loss\", loss, on_step=False, on_epoch=True)\n", "\n", "    def configure_optimizers(self):\n", "        optimizer = torch.optim.Adam(self.parameters(), lr=self.learning_rate)\n", "\n", "        warmup_steps = self.train_iters_per_epoch * self.warmup_epochs\n", "\n", "        scheduler = {\n", "            \"scheduler\": torch.optim.lr_scheduler.LambdaLR(\n", "                optimizer,\n", "                linear_warmup_decay(warmup_steps),\n", "            ),\n", "            \"interval\": \"step\",\n", "            \"frequency\": 1,\n", "        }\n", "\n", "        return [optimizer], [scheduler]"]}, {"cell_type": "markdown", "id": "c3413765", "metadata": {"papermill": {"duration": 0.020526, "end_time": "2023-01-03T14:11:36.052734", "exception": false, "start_time": "2023-01-03T14:11:36.032208", "status": "completed"}, "tags": []}, "source": ["### Evaluation\n", "\n", "We define a callback which appends a linear layer on top of the encoder and trains the classification evaluation head in an online manner. We make sure not to backpropagate the gradients back to the encoder while tuning the linear layer. This technique was used in SimCLR as well and they showed that the final downstream classification peformance is pretty much similar to the results on online finetuning as the training progresses."]}, {"cell_type": "code", "execution_count": 11, "id": "0e793c50", "metadata": {"execution": {"iopub.execute_input": "2023-01-03T14:11:36.113318Z", "iopub.status.busy": "2023-01-03T14:11:36.112998Z", "iopub.status.idle": "2023-01-03T14:11:36.124201Z", "shell.execute_reply": "2023-01-03T14:11:36.123471Z"}, "papermill": {"duration": 0.028007, "end_time": "2023-01-03T14:11:36.125800", "exception": false, "start_time": "2023-01-03T14:11:36.097793", "status": "completed"}, "tags": []}, "outputs": [], "source": ["class OnlineFineTuner(Callback):\n", "    def __init__(\n", "        self,\n", "        encoder_output_dim: int,\n", "        num_classes: int,\n", "    ) -> None:\n", "        super().__init__()\n", "\n", "        self.optimizer: torch.optim.Optimizer\n", "\n", "        self.encoder_output_dim = encoder_output_dim\n", "        self.num_classes = num_classes\n", "\n", "    def on_fit_start(self, trainer: pl.Trainer, pl_module: pl.LightningModule) -> None:\n", "        # add linear_eval layer and optimizer\n", "        pl_module.online_finetuner = nn.Linear(self.encoder_output_dim, self.num_classes).to(pl_module.device)\n", "        self.optimizer = torch.optim.Adam(pl_module.online_finetuner.parameters(), lr=1e-4)\n", "\n", "    def extract_online_finetuning_view(\n", "        self, batch: Sequence, device: Union[str, torch.device]\n", "    ) -> Tuple[Tensor, Tensor]:\n", "        (_, _, finetune_view), y = batch\n", "        finetune_view = finetune_view.to(device)\n", "        y = y.to(device)\n", "\n", "        return finetune_view, y\n", "\n", "    def on_train_batch_end(\n", "        self,\n", "        trainer: pl.Trainer,\n", "        pl_module: pl.LightningModule,\n", "        outputs: Sequence,\n", "        batch: Sequence,\n", "        batch_idx: int,\n", "        dataloader_idx: int,\n", "    ) -> None:\n", "        x, y = self.extract_online_finetuning_view(batch, pl_module.device)\n", "\n", "        with torch.no_grad():\n", "            feats = pl_module(x)\n", "\n", "        feats = feats.detach()\n", "        preds = pl_module.online_finetuner(feats)\n", "        loss = F.cross_entropy(preds, y)\n", "\n", "        loss.backward()\n", "        self.optimizer.step()\n", "        self.optimizer.zero_grad()\n", "\n", "        acc = accuracy(F.softmax(preds, dim=1), y)\n", "        pl_module.log(\"online_train_acc\", acc, on_step=True, on_epoch=False)\n", "        pl_module.log(\"online_train_loss\", loss, on_step=True, on_epoch=False)\n", "\n", "    def on_validation_batch_end(\n", "        self,\n", "        trainer: pl.Trainer,\n", "        pl_module: pl.LightningModule,\n", "        outputs: Sequence,\n", "        batch: Sequence,\n", "        batch_idx: int,\n", "        dataloader_idx: int,\n", "    ) -> None:\n", "        x, y = self.extract_online_finetuning_view(batch, pl_module.device)\n", "\n", "        with torch.no_grad():\n", "            feats = pl_module(x)\n", "\n", "        feats = feats.detach()\n", "        preds = pl_module.online_finetuner(feats)\n", "        loss = F.cross_entropy(preds, y)\n", "\n", "        acc = accuracy(F.softmax(preds, dim=1), y)\n", "        pl_module.log(\"online_val_acc\", acc, on_step=False, on_epoch=True, sync_dist=True)\n", "        pl_module.log(\"online_val_loss\", loss, on_step=False, on_epoch=True, sync_dist=True)"]}, {"cell_type": "markdown", "id": "cff362c6", "metadata": {"papermill": {"duration": 0.047791, "end_time": "2023-01-03T14:11:36.196803", "exception": false, "start_time": "2023-01-03T14:11:36.149012", "status": "completed"}, "tags": []}, "source": ["Finally, we define the trainer for training the model. We pass in the ``train_loader`` and ``val_loader`` we had initialized earlier to the ``fit`` function."]}, {"cell_type": "code", "execution_count": 12, "id": "ee9a9323", "metadata": {"execution": {"iopub.execute_input": "2023-01-03T14:11:36.221931Z", "iopub.status.busy": "2023-01-03T14:11:36.221141Z", "iopub.status.idle": "2023-01-03T14:11:36.603716Z", "shell.execute_reply": "2023-01-03T14:11:36.603041Z"}, "papermill": {"duration": 0.399597, "end_time": "2023-01-03T14:11:36.605272", "exception": false, "start_time": "2023-01-03T14:11:36.205675", "status": "completed"}, "tags": []}, "outputs": [{"name": "stderr", "output_type": "stream", "text": ["ModelCheckpoint(save_last=True, save_top_k=-1, monitor=None) will duplicate the last checkpoint saved.\n"]}, {"name": "stderr", "output_type": "stream", "text": ["GPU available: True (cuda), used: True\n"]}, {"name": "stderr", "output_type": "stream", "text": ["TPU available: False, using: 0 TPU cores\n"]}, {"name": "stderr", "output_type": "stream", "text": ["IPU available: False, using: 0 IPUs\n"]}, {"name": "stderr", "output_type": "stream", "text": ["HPU available: False, using: 0 HPUs\n"]}], "source": ["encoder_out_dim = 512\n", "\n", "model = BarlowTwins(\n", "    encoder=encoder,\n", "    encoder_out_dim=encoder_out_dim,\n", "    num_training_samples=len(train_dataset),\n", "    batch_size=batch_size,\n", "    z_dim=z_dim,\n", ")\n", "\n", "online_finetuner = OnlineFineTuner(encoder_output_dim=encoder_out_dim, num_classes=10)\n", "checkpoint_callback = ModelCheckpoint(every_n_epochs=100, save_top_k=-1, save_last=True)\n", "\n", "trainer = Trainer(\n", "    max_epochs=max_epochs,\n", "    accelerator=\"auto\",\n", "    devices=1 if torch.cuda.is_available() else None,  # limiting got iPython runs\n", "    callbacks=[online_finetuner, checkpoint_callback],\n", ")\n", "\n", "# uncomment this to train the model\n", "# this is done for the tutorial so that the notebook compiles\n", "# trainer.fit(model, train_loader, val_loader)"]}, {"cell_type": "markdown", "id": "c1bcf484", "metadata": {"papermill": {"duration": 0.044822, "end_time": "2023-01-03T14:11:38.291529", "exception": false, "start_time": "2023-01-03T14:11:38.246707", "status": "completed"}, "tags": []}, "source": ["### Using the trained encoder for downstream tasks\n", "\n", "Once the encoder is pretrained on CIFAR10, we can use it to get image embeddings and use them further downstream on tasks like classification, detection, segmentation etc.\n", "\n", "In this tutorial, we did not completely train our encoder for 100s of epochs using the Barlow Twins pretraining method. So, we will load the pretrained encoder weights from a checkpoint and show the image embeddings obtained from that.\n", "\n", "To create this checkpoint, the encoder was pretrained for 200 epochs, and obtained a online finetune accuracy of x% on CIFAR-10."]}, {"cell_type": "code", "execution_count": 13, "id": "9cd56c8d", "metadata": {"execution": {"iopub.execute_input": "2023-01-03T14:11:38.352115Z", "iopub.status.busy": "2023-01-03T14:11:38.351337Z", "iopub.status.idle": "2023-01-03T14:11:38.802463Z", "shell.execute_reply": "2023-01-03T14:11:38.801722Z"}, "papermill": {"duration": 0.490473, "end_time": "2023-01-03T14:11:38.804040", "exception": false, "start_time": "2023-01-03T14:11:38.313567", "status": "completed"}, "tags": []}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["torch.Size([4, 512])\n"]}], "source": ["# ckpt_model = torch.load('')  # upload checkpoint to aws\n", "# encoder = ckpt_model.encoder\n", "encoder = model.encoder\n", "\n", "downstream_dataset = CIFAR10(root=\".\", train=False, transform=transforms.ToTensor())\n", "dataloader = DataLoader(downstream_dataset, batch_size=4, shuffle=False)\n", "\n", "for batch in dataloader:\n", "    img, label = batch\n", "    print(encoder(img).shape)\n", "    break"]}, {"cell_type": "markdown", "id": "1df0d4d8", "metadata": {"papermill": {"duration": 0.207975, "end_time": "2023-01-03T14:11:39.102011", "exception": false, "start_time": "2023-01-03T14:11:38.894036", "status": "completed"}, "tags": []}, "source": ["## Congratulations - Time to Join the Community!\n", "\n", "Congratulations on completing this notebook tutorial! If you enjoyed this and would like to join the Lightning\n", "movement, you can do so in the following ways!\n", "\n", "### Star [Lightning](https://github.com/Lightning-AI/lightning) on GitHub\n", "The easiest way to help our community is just by starring the GitHub repos! This helps raise awareness of the cool\n", "tools we're building.\n", "\n", "### Join our [Slack](https://www.pytorchlightning.ai/community)!\n", "The best way to keep up to date on the latest advancements is to join our community! Make sure to introduce yourself\n", "and share your interests in `#general` channel\n", "\n", "\n", "### Contributions !\n", "The best way to contribute to our community is to become a code contributor! At any time you can go to\n", "[Lightning](https://github.com/Lightning-AI/lightning) or [Bolt](https://github.com/Lightning-AI/lightning-bolts)\n", "GitHub Issues page and filter for \"good first issue\".\n", "\n", "* [Lightning good first issue](https://github.com/Lightning-AI/lightning/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22)\n", "* [Bolt good first issue](https://github.com/Lightning-AI/lightning-bolts/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22)\n", "* You can also contribute your own notebooks with useful examples !\n", "\n", "### Great thanks from the entire Pytorch Lightning Team for your interest !\n", "\n", "[![Pytorch Lightning](){height=\"60px\" width=\"240px\"}](https://pytorchlightning.ai)"]}, {"cell_type": "raw", "metadata": {"raw_mimetype": "text/restructuredtext"}, "source": [".. customcarditem::\n", "   :header: Barlow Twins Tutorial\n", "   :card_description: This notebook describes the self-supervised learning method Barlow Twins. Barlow Twins differs from other recently proposed algorithms as it doesn't fall under the category of...\n", "   :tags: Image,Self-Supervised,GPU/TPU,Lightning-Examples"]}], "metadata": {"jupytext": {"cell_metadata_filter": "colab,colab_type,id,-all", "formats": "ipynb,py:percent", "main_language": "python"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.16"}, "papermill": {"default_parameters": {}, "duration": 30.194704, "end_time": "2023-01-03T14:11:39.969641", "environment_variables": {}, "exception": null, "input_path": "lightning_examples/barlow-twins/barlow_twins.ipynb", "output_path": ".notebooks/lightning_examples/barlow-twins.ipynb", "parameters": {}, "start_time": "2023-01-03T14:11:09.774937", "version": "2.4.0"}, "widgets": {"application/vnd.jupyter.widget-state+json": {"state": {"1f3f15ee41004a4fb50d0e2e4c445423": {"model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": {"_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null}}, "29f471718c9d435c8e688165764dad08": {"model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "ProgressStyleModel", "state": {"_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "ProgressStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "StyleView", "bar_color": null, "description_width": ""}}, "32df8907d8ad4492bd27800758e7f649": {"model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLModel", "state": {"_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "2.0.0", "_view_name": "HTMLView", "description": "", "description_allow_html": false, "layout": "IPY_MODEL_3859e01b143a4b98a765c23f3fec02dc", "placeholder": "\u200b", "style": "IPY_MODEL_93246f4655e8445897b37ce2ecbe4047", "tabbable": null, "tooltip": null, "value": " 170498071/170498071 [00:04&lt;00:00, 50745212.15it/s]"}}, "3859e01b143a4b98a765c23f3fec02dc": {"model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": {"_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null}}, "3ec46cc79f9b489b89fa9aee0cf7bb14": {"model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "FloatProgressModel", "state": {"_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "FloatProgressModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "2.0.0", "_view_name": "ProgressView", "bar_style": "success", "description": "", "description_allow_html": false, "layout": "IPY_MODEL_1f3f15ee41004a4fb50d0e2e4c445423", "max": 170498071.0, "min": 0.0, "orientation": "horizontal", "style": "IPY_MODEL_29f471718c9d435c8e688165764dad08", "tabbable": null, "tooltip": null, "value": 170498071.0}}, "476090c001a6421f9596e56c6835ce9f": {"model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLModel", "state": {"_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "2.0.0", "_view_name": "HTMLView", "description": "", "description_allow_html": false, "layout": "IPY_MODEL_ba30d5cf5f05433c85ca22f5a6a98f8e", "placeholder": "\u200b", "style": "IPY_MODEL_544b461b82964864b10bea68ad6faf26", "tabbable": null, "tooltip": null, "value": "100%"}}, "53abe81779c9401d8aafb1959ffc6b4d": {"model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": {"_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null}}, "544b461b82964864b10bea68ad6faf26": {"model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLStyleModel", "state": {"_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "HTMLStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "StyleView", "background": null, "description_width": "", "font_size": null, "text_color": null}}, "93246f4655e8445897b37ce2ecbe4047": {"model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLStyleModel", "state": {"_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "HTMLStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "StyleView", "background": null, "description_width": "", "font_size": null, "text_color": null}}, "9c7080f0c55149c584aca8c54558da86": {"model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HBoxModel", "state": {"_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "HBoxModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "2.0.0", "_view_name": "HBoxView", "box_style": "", "children": ["IPY_MODEL_476090c001a6421f9596e56c6835ce9f", "IPY_MODEL_3ec46cc79f9b489b89fa9aee0cf7bb14", "IPY_MODEL_32df8907d8ad4492bd27800758e7f649"], "layout": "IPY_MODEL_53abe81779c9401d8aafb1959ffc6b4d", "tabbable": null, "tooltip": null}}, "ba30d5cf5f05433c85ca22f5a6a98f8e": {"model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": {"_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null}}}, "version_major": 2, "version_minor": 0}}}, "nbformat": 4, "nbformat_minor": 5}