{"cells": [{"cell_type": "markdown", "id": "d8ba6a1e", "metadata": {"papermill": {"duration": 0.018331, "end_time": "2023-03-14T16:30:03.464291", "exception": false, "start_time": "2023-03-14T16:30:03.445960", "status": "completed"}, "tags": []}, "source": ["\n", "# Tutorial 13: Self-Supervised Contrastive Learning with SimCLR\n", "\n", "* **Author:** Phillip Lippe\n", "* **License:** CC BY-SA\n", "* **Generated:** 2023-03-14T16:28:29.031195\n", "\n", "In this tutorial, we will take a closer look at self-supervised contrastive learning.\n", "Self-supervised learning, or also sometimes called unsupervised learning, describes the scenario where we have given input data, but no accompanying labels to train in a classical supervised way.\n", "However, this data still contains a lot of information from which we can learn: how are the images different from each other?\n", "What patterns are descriptive for certain images?\n", "Can we cluster the images?\n", "To get an insight into these questions, we will implement a popular, simple contrastive learning method, SimCLR, and apply it to the STL10 dataset.\n", "This notebook is part of a lecture series on Deep Learning at the University of Amsterdam.\n", "The full list of tutorials can be found at https://uvadlc-notebooks.rtfd.io.\n", "\n", "\n", "---\n", "Open in [![Open In Colab](){height=\"20px\" width=\"117px\"}](https://colab.research.google.com/github/PytorchLightning/lightning-tutorials/blob/publication/.notebooks/course_UvA-DL/13-contrastive-learning.ipynb)\n", "\n", "Give us a \u2b50 [on Github](https://www.github.com/Lightning-AI/lightning/)\n", "| Check out [the documentation](https://pytorch-lightning.readthedocs.io/en/stable/)\n", "| Join us [on Slack](https://www.pytorchlightning.ai/community)"]}, {"cell_type": "markdown", "id": "4d4b9328", "metadata": {"papermill": {"duration": 0.012168, "end_time": "2023-03-14T16:30:03.487815", "exception": false, "start_time": "2023-03-14T16:30:03.475647", "status": "completed"}, "tags": []}, "source": ["## Setup\n", "This notebook requires some packages besides pytorch-lightning."]}, {"cell_type": "code", "execution_count": 1, "id": "77c825b4", "metadata": {"colab": {}, "colab_type": "code", "execution": {"iopub.execute_input": "2023-03-14T16:30:03.544045Z", "iopub.status.busy": "2023-03-14T16:30:03.543683Z", "iopub.status.idle": "2023-03-14T16:30:06.843558Z", "shell.execute_reply": "2023-03-14T16:30:06.842189Z"}, "id": "LfrJLKPFyhsK", "lines_to_next_cell": 0, "papermill": {"duration": 3.316274, "end_time": "2023-03-14T16:30:06.847061", "exception": false, "start_time": "2023-03-14T16:30:03.530787", "status": "completed"}, "tags": []}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\r\n", "\u001b[0m"]}], "source": ["! pip install --quiet \"ipython[notebook]>=8.0.0, <8.12.0\" \"torchmetrics>=0.7, <0.12\" \"seaborn\" \"torchvision\" \"setuptools==67.4.0\" \"matplotlib\" \"lightning>=2.0.0rc0\" \"torch>=1.8.1, <1.14.0\" \"pytorch-lightning>=1.4, <2.0.0\""]}, {"cell_type": "markdown", "id": "787d2fad", "metadata": {"papermill": {"duration": 0.011187, "end_time": "2023-03-14T16:30:06.875314", "exception": false, "start_time": "2023-03-14T16:30:06.864127", "status": "completed"}, "tags": []}, "source": ["
\n", "Methods for self-supervised learning try to learn as much as possible from the data alone, so it can quickly be finetuned for a specific classification task.\n", "The benefit of self-supervised learning is that a large dataset can often easily be obtained.\n", "For instance, if we want to train a vision model on semantic segmentation for autonomous driving, we can collect large amounts of data by simply installing a camera in a car, and driving through a city for an hour.\n", "In contrast, if we would want to do supervised learning, we would have to manually label all those images before training a model.\n", "This is extremely expensive, and would likely take a couple of months to manually label the same amount of data.\n", "Further, self-supervised learning can provide an alternative to transfer learning from models pretrained on ImageNet since we could pretrain a model on a specific dataset/situation, e.g. traffic scenarios for autonomous driving.\n", "\n", "Within the last two years, a lot of new approaches have been proposed for self-supervised learning, in particular for images, that have resulted in great improvements over supervised models when few labels are available.\n", "The subfield that we will focus on in this tutorial is contrastive learning.\n", "Contrastive learning is motivated by the question mentioned above: how are images different from each other?\n", "Specifically, contrastive learning methods train a model to cluster an image and its slightly augmented version in latent space, while the distance to other images should be maximized.\n", "A very recent and simple method for this is [SimCLR](https://arxiv.org/abs/2006.10029), which is visualized below (figure credit - [Ting Chen et al. ](https://simclr.github.io/)).\n", "\n", "
![simclr contrastive learning](){width=\"500px\"}
\n", "\n", "The general setup is that we are given a dataset of images without any labels, and want to train a model on this data such that it can quickly adapt to any image recognition task afterward.\n", "During each training iteration, we sample a batch of images as usual.\n", "For each image, we create two versions by applying data augmentation techniques like cropping, Gaussian noise, blurring, etc.\n", "An example of such is shown on the left with the image of the dog.\n", "We will go into the details and effects of the chosen augmentation techniques later.\n", "On those images, we apply a CNN like ResNet and obtain as output a 1D feature vector on which we apply a small MLP.\n", "The output features of the two augmented images are then trained to be close to each other, while all other images in that batch should be as different as possible.\n", "This way, the model has to learn to recognize the content of the image that remains unchanged under the data augmentations, such as objects which we usually care about in supervised tasks.\n", "\n", "We will now implement this framework ourselves and discuss further details along the way.\n", "Let's first start with importing our standard libraries below:"]}, {"cell_type": "code", "execution_count": 2, "id": "979b6759", "metadata": {"execution": {"iopub.execute_input": "2023-03-14T16:30:06.899524Z", "iopub.status.busy": "2023-03-14T16:30:06.899125Z", "iopub.status.idle": "2023-03-14T16:30:10.208909Z", "shell.execute_reply": "2023-03-14T16:30:10.207877Z"}, "papermill": {"duration": 3.324039, "end_time": "2023-03-14T16:30:10.210442", "exception": false, "start_time": "2023-03-14T16:30:06.886403", "status": "completed"}, "tags": []}, "outputs": [{"name": "stderr", "output_type": "stream", "text": ["Global seed set to 42\n"]}, {"name": "stdout", "output_type": "stream", "text": ["Device: cuda:0\n", "Number of workers: 64\n"]}, {"data": {"text/plain": ["
"]}, "metadata": {}, "output_type": "display_data"}], "source": ["import os\n", "import urllib.request\n", "from copy import deepcopy\n", "from urllib.error import HTTPError\n", "\n", "import lightning as L\n", "import matplotlib\n", "import matplotlib.pyplot as plt\n", "import matplotlib_inline.backend_inline\n", "import seaborn as sns\n", "import torch\n", "import torch.nn as nn\n", "import torch.nn.functional as F\n", "import torch.optim as optim\n", "import torch.utils.data as data\n", "import torchvision\n", "from lightning.pytorch.callbacks import LearningRateMonitor, ModelCheckpoint\n", "from torchvision import transforms\n", "from torchvision.datasets import STL10\n", "from tqdm.notebook import tqdm\n", "\n", "plt.set_cmap(\"cividis\")\n", "%matplotlib inline\n", "matplotlib_inline.backend_inline.set_matplotlib_formats(\"svg\", \"pdf\") # For export\n", "matplotlib.rcParams[\"lines.linewidth\"] = 2.0\n", "sns.set()\n", "\n", "# Import tensorboard\n", "%load_ext tensorboard\n", "\n", "# Path to the folder where the datasets are/should be downloaded (e.g. CIFAR10)\n", "DATASET_PATH = os.environ.get(\"PATH_DATASETS\", \"data/\")\n", "# Path to the folder where the pretrained models are saved\n", "CHECKPOINT_PATH = os.environ.get(\"PATH_CHECKPOINT\", \"saved_models/ContrastiveLearning/\")\n", "# In this notebook, we use data loaders with heavier computational processing. It is recommended to use as many\n", "# workers as possible in a data loader, which corresponds to the number of CPU cores\n", "NUM_WORKERS = os.cpu_count()\n", "\n", "# Setting the seed\n", "L.seed_everything(42)\n", "\n", "# Ensure that all operations are deterministic on GPU (if used) for reproducibility\n", "torch.backends.cudnn.determinstic = True\n", "torch.backends.cudnn.benchmark = False\n", "\n", "device = torch.device(\"cuda:0\") if torch.cuda.is_available() else torch.device(\"cpu\")\n", "print(\"Device:\", device)\n", "print(\"Number of workers:\", NUM_WORKERS)"]}, {"cell_type": "markdown", "id": "fb7a7bfa", "metadata": {"papermill": {"duration": 0.011473, "end_time": "2023-03-14T16:30:10.234242", "exception": false, "start_time": "2023-03-14T16:30:10.222769", "status": "completed"}, "tags": []}, "source": ["As in many tutorials before, we provide pre-trained models.\n", "Note that those models are slightly larger as normal (~100MB overall) since we use the default ResNet-18 architecture.\n", "If you are running this notebook locally, make sure to have sufficient disk space available."]}, {"cell_type": "code", "execution_count": 3, "id": "73a4609a", "metadata": {"execution": {"iopub.execute_input": "2023-03-14T16:30:10.259283Z", "iopub.status.busy": "2023-03-14T16:30:10.258531Z", "iopub.status.idle": "2023-03-14T16:30:14.969799Z", "shell.execute_reply": "2023-03-14T16:30:14.968407Z"}, "papermill": {"duration": 4.726886, "end_time": "2023-03-14T16:30:14.972502", "exception": false, "start_time": "2023-03-14T16:30:10.245616", "status": "completed"}, "tags": []}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["Downloading https://raw.githubusercontent.com/phlippe/saved_models/main/tutorial17/SimCLR.ckpt...\n"]}, {"name": "stdout", "output_type": "stream", "text": ["Downloading https://raw.githubusercontent.com/phlippe/saved_models/main/tutorial17/ResNet.ckpt...\n"]}, {"name": "stdout", "output_type": "stream", "text": ["Downloading https://raw.githubusercontent.com/phlippe/saved_models/main/tutorial17/tensorboards/SimCLR/events.out.tfevents.SimCLR...\n"]}, {"name": "stdout", "output_type": "stream", "text": ["Downloading https://raw.githubusercontent.com/phlippe/saved_models/main/tutorial17/tensorboards/classification/ResNet/events.out.tfevents.ResNet...\n", "Downloading https://raw.githubusercontent.com/phlippe/saved_models/main/tutorial17/LogisticRegression_10.ckpt...\n"]}, {"name": "stdout", "output_type": "stream", "text": ["Downloading https://raw.githubusercontent.com/phlippe/saved_models/main/tutorial17/LogisticRegression_20.ckpt...\n", "Downloading https://raw.githubusercontent.com/phlippe/saved_models/main/tutorial17/LogisticRegression_50.ckpt...\n"]}, {"name": "stdout", "output_type": "stream", "text": ["Downloading https://raw.githubusercontent.com/phlippe/saved_models/main/tutorial17/LogisticRegression_100.ckpt...\n", "Downloading https://raw.githubusercontent.com/phlippe/saved_models/main/tutorial17/LogisticRegression_200.ckpt...\n"]}, {"name": "stdout", "output_type": "stream", "text": ["Downloading https://raw.githubusercontent.com/phlippe/saved_models/main/tutorial17/LogisticRegression_500.ckpt...\n"]}], "source": ["# Github URL where saved models are stored for this tutorial\n", "base_url = \"https://raw.githubusercontent.com/phlippe/saved_models/main/tutorial17/\"\n", "# Files to download\n", "pretrained_files = [\n", " \"SimCLR.ckpt\",\n", " \"ResNet.ckpt\",\n", " \"tensorboards/SimCLR/events.out.tfevents.SimCLR\",\n", " \"tensorboards/classification/ResNet/events.out.tfevents.ResNet\",\n", "]\n", "pretrained_files += [f\"LogisticRegression_{size}.ckpt\" for size in [10, 20, 50, 100, 200, 500]]\n", "# Create checkpoint path if it doesn't exist yet\n", "os.makedirs(CHECKPOINT_PATH, exist_ok=True)\n", "\n", "# For each file, check whether it already exists. If not, try downloading it.\n", "for file_name in pretrained_files:\n", " file_path = os.path.join(CHECKPOINT_PATH, file_name)\n", " if \"/\" in file_name:\n", " os.makedirs(file_path.rsplit(\"/\", 1)[0], exist_ok=True)\n", " if not os.path.isfile(file_path):\n", " file_url = base_url + file_name\n", " print(f\"Downloading {file_url}...\")\n", " try:\n", " urllib.request.urlretrieve(file_url, file_path)\n", " except HTTPError as e:\n", " print(\n", " \"Something went wrong. Please try to download the file from the GDrive folder, or contact the author with the full output including the following error:\\n\",\n", " e,\n", " )"]}, {"cell_type": "markdown", "id": "4438e998", "metadata": {"papermill": {"duration": 0.011733, "end_time": "2023-03-14T16:30:15.000836", "exception": false, "start_time": "2023-03-14T16:30:14.989103", "status": "completed"}, "tags": []}, "source": ["## SimCLR\n", "\n", "We will start our exploration of contrastive learning by discussing the effect of different data augmentation techniques, and how we can implement an efficient data loader for such.\n", "Next, we implement SimCLR with PyTorch Lightning, and finally train it on a large, unlabeled dataset."]}, {"cell_type": "markdown", "id": "03954ff6", "metadata": {"lines_to_next_cell": 2, "papermill": {"duration": 0.011708, "end_time": "2023-03-14T16:30:15.024227", "exception": false, "start_time": "2023-03-14T16:30:15.012519", "status": "completed"}, "tags": []}, "source": ["### Data Augmentation for Contrastive Learning\n", "\n", "To allow efficient training, we need to prepare the data loading such that we sample two different, random augmentations for each image in the batch.\n", "The easiest way to do this is by creating a transformation that, when being called, applies a set of data augmentations to an image twice.\n", "This is implemented in the class `ContrastiveTransformations` below:"]}, {"cell_type": "code", "execution_count": 4, "id": "04fa3abc", "metadata": {"execution": {"iopub.execute_input": "2023-03-14T16:30:15.049654Z", "iopub.status.busy": "2023-03-14T16:30:15.049213Z", "iopub.status.idle": "2023-03-14T16:30:15.056829Z", "shell.execute_reply": "2023-03-14T16:30:15.055826Z"}, "papermill": {"duration": 0.023289, "end_time": "2023-03-14T16:30:15.059153", "exception": false, "start_time": "2023-03-14T16:30:15.035864", "status": "completed"}, "tags": []}, "outputs": [], "source": ["class ContrastiveTransformations:\n", " def __init__(self, base_transforms, n_views=2):\n", " self.base_transforms = base_transforms\n", " self.n_views = n_views\n", "\n", " def __call__(self, x):\n", " return [self.base_transforms(x) for i in range(self.n_views)]"]}, {"cell_type": "markdown", "id": "9865de3c", "metadata": {"papermill": {"duration": 0.01361, "end_time": "2023-03-14T16:30:15.089696", "exception": false, "start_time": "2023-03-14T16:30:15.076086", "status": "completed"}, "tags": []}, "source": ["The contrastive learning framework can easily be extended to have more _positive_ examples by sampling more than two augmentations of the same image.\n", "However, the most efficient training is usually obtained by using only two.\n", "\n", "Next, we can look at the specific augmentations we want to apply.\n", "The choice of the data augmentation to use is the most crucial hyperparameter in SimCLR since it directly affects how the latent space is structured, and what patterns might be learned from the data.\n", "Let's first take a look at some of the most popular data augmentations (figure credit - [Ting Chen and Geoffrey Hinton](https://ai.googleblog.com/2020/04/advancing-self-supervised-and-semi.html)):\n", "\n", "
\n", "\n", "All of them can be used, but it turns out that two augmentations stand out in their importance: crop-and-resize, and color distortion.\n", "Interestingly, however, they only lead to strong performance if they have been used together as discussed by [Ting Chen et al. ](https://arxiv.org/abs/2006.10029) in their SimCLR paper.\n", "When performing randomly cropping and resizing, we can distinguish between two situations: (a) cropped image A provides a local view of cropped image B, or (b) cropped images C and D show neighboring views of the same image (figure credit - [Ting Chen and Geoffrey Hinton](https://ai.googleblog.com/2020/04/advancing-self-supervised-and-semi.html)).\n", "\n", "
\n", "\n", "While situation (a) requires the model to learn some sort of scale invariance to make crops A and B similar in latent space, situation (b) is more challenging since the model needs to recognize an object beyond its limited view.\n", "However, without color distortion, there is a loophole that the model can exploit, namely that different crops of the same image usually look very similar in color space.\n", "Consider the picture of the dog above.\n", "Simply from the color of the fur and the green color tone of the background, you can reason that two patches belong to the same image without actually recognizing the dog in the picture.\n", "In this case, the model might end up focusing only on the color histograms of the images, and ignore other more generalizable features.\n", "If, however, we distort the colors in the two patches randomly and independently of each other, the model cannot rely on this simple feature anymore.\n", "Hence, by combining random cropping and color distortions, the model can only match two patches by learning generalizable representations.\n", "\n", "Overall, for our experiments, we apply a set of 5 transformations following the original SimCLR setup: random horizontal flip, crop-and-resize, color distortion, random grayscale, and gaussian blur.\n", "In comparison to the [original implementation](https://github.com/google-research/simclr), we reduce the effect of the color jitter slightly (0.5 instead of 0.8 for brightness, contrast, and saturation, and 0.1 instead of 0.2 for hue).\n", "In our experiments, this setting obtained better performance and was faster and more stable to train.\n", "If, for instance, the brightness scale highly varies in a dataset, the\n", "original settings can be more beneficial since the model can't rely on\n", "this information anymore to distinguish between images."]}, {"cell_type": "code", "execution_count": 5, "id": "dc0d4496", "metadata": {"execution": {"iopub.execute_input": "2023-03-14T16:30:15.115368Z", "iopub.status.busy": "2023-03-14T16:30:15.114913Z", "iopub.status.idle": "2023-03-14T16:30:15.121609Z", "shell.execute_reply": "2023-03-14T16:30:15.120777Z"}, "papermill": {"duration": 0.02116, "end_time": "2023-03-14T16:30:15.123044", "exception": false, "start_time": "2023-03-14T16:30:15.101884", "status": "completed"}, "tags": []}, "outputs": [], "source": ["contrast_transforms = transforms.Compose(\n", " [\n", " transforms.RandomHorizontalFlip(),\n", " transforms.RandomResizedCrop(size=96),\n", " transforms.RandomApply([transforms.ColorJitter(brightness=0.5, contrast=0.5, saturation=0.5, hue=0.1)], p=0.8),\n", " transforms.RandomGrayscale(p=0.2),\n", " transforms.GaussianBlur(kernel_size=9),\n", " transforms.ToTensor(),\n", " transforms.Normalize((0.5,), (0.5,)),\n", " ]\n", ")"]}, {"cell_type": "markdown", "id": "9eadbe5b", "metadata": {"papermill": {"duration": 0.01184, "end_time": "2023-03-14T16:30:15.152034", "exception": false, "start_time": "2023-03-14T16:30:15.140194", "status": "completed"}, "tags": []}, "source": ["After discussing the data augmentation techniques, we can now focus on the dataset.\n", "In this tutorial, we will use the [STL10 dataset](https://cs.stanford.edu/~acoates/stl10/), which, similarly to CIFAR10, contains images of 10 classes: airplane, bird, car, cat, deer, dog, horse, monkey, ship, truck.\n", "However, the images have a higher resolution, namely $96\\times 96$ pixels, and we are only provided with 500 labeled images per class.\n", "Additionally, we have a much larger set of $100,000$ unlabeled images which are similar to the training images but are sampled from a wider range of animals and vehicles.\n", "This makes the dataset ideal to showcase the benefits that self-supervised learning offers.\n", "\n", "Luckily, the STL10 dataset is provided through torchvision.\n", "Keep in mind, however, that since this dataset is relatively large and has a considerably higher resolution than CIFAR10, it requires more disk space (~3GB) and takes a bit of time to download.\n", "For our initial discussion of self-supervised learning and SimCLR, we\n", "will create two data loaders with our contrastive transformations above:\n", "the `unlabeled_data` will be used to train our model via contrastive\n", "learning, and `train_data_contrast` will be used as a validation set in\n", "contrastive learning."]}, {"cell_type": "code", "execution_count": 6, "id": "f6e52644", "metadata": {"execution": {"iopub.execute_input": "2023-03-14T16:30:15.177362Z", "iopub.status.busy": "2023-03-14T16:30:15.176999Z", "iopub.status.idle": "2023-03-14T16:42:24.224689Z", "shell.execute_reply": "2023-03-14T16:42:24.222780Z"}, "papermill": {"duration": 729.079524, "end_time": "2023-03-14T16:42:24.243509", "exception": false, "start_time": "2023-03-14T16:30:15.163985", "status": "completed"}, "tags": []}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["Downloading http://ai.stanford.edu/~acoates/stl10/stl10_binary.tar.gz to /__w/14/s/.datasets/stl10_binary.tar.gz\n"]}, {"data": {"application/vnd.jupyter.widget-view+json": {"model_id": "6c214464531a48c38a4cf8c9e528cb59", "version_major": 2, "version_minor": 0}, "text/plain": [" 0%| | 0/2640397119 [00:00\n", "\n", "\n", " \n", " \n", " \n", " \n", " 2023-03-14T16:42:24.541413\n", " image/svg+xml\n", " \n", " \n", " Matplotlib v3.7.1, https://matplotlib.org/\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n"], "text/plain": ["
"]}, "metadata": {}, "output_type": "display_data"}], "source": ["# Visualize some examples\n", "L.seed_everything(42)\n", "NUM_IMAGES = 6\n", "imgs = torch.stack([img for idx in range(NUM_IMAGES) for img in unlabeled_data[idx][0]], dim=0)\n", "img_grid = torchvision.utils.make_grid(imgs, nrow=6, normalize=True, pad_value=0.9)\n", "img_grid = img_grid.permute(1, 2, 0)\n", "\n", "plt.figure(figsize=(10, 5))\n", "plt.title(\"Augmented image examples of the STL10 dataset\")\n", "plt.imshow(img_grid)\n", "plt.axis(\"off\")\n", "plt.show()\n", "plt.close()"]}, {"cell_type": "markdown", "id": "024547cb", "metadata": {"papermill": {"duration": 0.018472, "end_time": "2023-03-14T16:42:24.969320", "exception": false, "start_time": "2023-03-14T16:42:24.950848", "status": "completed"}, "tags": []}, "source": ["We see the wide variety of our data augmentation, including randomly cropping, grayscaling, gaussian blur, and color distortion.\n", "Thus, it remains a challenging task for the model to match two, independently augmented patches of the same image."]}, {"cell_type": "markdown", "id": "9f326103", "metadata": {"lines_to_next_cell": 2, "papermill": {"duration": 0.017967, "end_time": "2023-03-14T16:42:25.005426", "exception": false, "start_time": "2023-03-14T16:42:24.987459", "status": "completed"}, "tags": []}, "source": ["### SimCLR implementation\n", "\n", "Using the data loader pipeline above, we can now implement SimCLR.\n", "At each iteration, we get for every image $x$ two differently augmented versions, which we refer to as $\\tilde{x}_i$ and $\\tilde{x}_j$.\n", "Both of these images are encoded into a one-dimensional feature vector, between which we want to maximize similarity which minimizes it to all other images in the batch.\n", "The encoder network is split into two parts: a base encoder network $f(\\cdot)$, and a projection head $g(\\cdot)$.\n", "The base network is usually a deep CNN as we have seen in e.g. [Tutorial 5](https://uvadlc-notebooks.readthedocs.io/en/latest/tutorial_notebooks/tutorial5/Inception_ResNet_DenseNet.html) before, and is responsible for extracting a representation vector from the augmented data examples.\n", "In our experiments, we will use the common ResNet-18 architecture as $f(\\cdot)$, and refer to the output as $f(\\tilde{x}_i)=h_i$.\n", "The projection head $g(\\cdot)$ maps the representation $h$ into a space where we apply the contrastive loss, i.e., compare similarities between vectors.\n", "It is often chosen to be a small MLP with non-linearities, and for simplicity, we follow the original SimCLR paper setup by defining it as a two-layer MLP with ReLU activation in the hidden layer.\n", "Note that in the follow-up paper, [SimCLRv2](https://arxiv.org/abs/2006.10029), the authors mention that larger/wider MLPs can boost the performance considerably.\n", "This is why we apply an MLP with four times larger hidden dimensions, but deeper MLPs showed to overfit on the given dataset.\n", "The general setup is visualized below (figure credit - [Ting Chen et al. ](https://arxiv.org/abs/2006.10029)):\n", "\n", "
\n", "\n", "After finishing the training with contrastive learning, we will remove the projection head $g(\\cdot)$, and use $f(\\cdot)$ as a pretrained feature extractor.\n", "The representations $z$ that come out of the projection head $g(\\cdot)$ have been shown to perform worse than those of the base network $f(\\cdot)$ when finetuning the network for a new task.\n", "This is likely because the representations $z$ are trained to become invariant to many features like the color that can be important for downstream tasks.\n", "Thus, $g(\\cdot)$ is only needed for the contrastive learning stage.\n", "\n", "Now that the architecture is described, let's take a closer look at how we train the model.\n", "As mentioned before, we want to maximize the similarity between the representations of the two augmented versions of the same image, i.e., $z_i$ and $z_j$ in the figure above, while minimizing it to all other examples in the batch.\n", "SimCLR thereby applies the InfoNCE loss, originally proposed by [Aaron van den Oord et al. ](https://arxiv.org/abs/1807.03748) for contrastive learning.\n", "In short, the InfoNCE loss compares the similarity of $z_i$ and $z_j$ to the similarity of $z_i$ to any other representation in the batch by performing a softmax over the similarity values.\n", "The loss can be formally written as:\n", "$$\n", "\\ell_{i,j}=-\\log \\frac{\\exp(\\text{sim}(z_i,z_j)/\\tau)}{\\sum_{k=1}^{2N}\\mathbb{1}_{[k\\neq i]}\\exp(\\text{sim}(z_i,z_k)/\\tau)}=-\\text{sim}(z_i,z_j)/\\tau+\\log\\left[\\sum_{k=1}^{2N}\\mathbb{1}_{[k\\neq i]}\\exp(\\text{sim}(z_i,z_k)/\\tau)\\right]\n", "$$\n", "The function $\\text{sim}$ is a similarity metric, and the hyperparameter $\\tau$ is called temperature determining how peaked the distribution is.\n", "Since many similarity metrics are bounded, the temperature parameter allows us to balance the influence of many dissimilar image patches versus one similar patch.\n", "The similarity metric that is used in SimCLR is cosine similarity, as defined below:\n", "$$\n", "\\text{sim}(z_i,z_j) = \\frac{z_i^\\top \\cdot z_j}{||z_i||\\cdot||z_j||}\n", "$$\n", "The maximum cosine similarity possible is $1$, while the minimum is $-1$.\n", "In general, we will see that the features of two different images will converge to a cosine similarity around zero since the minimum, $-1$, would require $z_i$ and $z_j$ to be in the exact opposite direction in all feature dimensions, which does not allow for great flexibility.\n", "\n", "Finally, now that we have discussed all details, let's implement SimCLR below as a PyTorch Lightning module:"]}, {"cell_type": "code", "execution_count": 8, "id": "65d1d07f", "metadata": {"execution": {"iopub.execute_input": "2023-03-14T16:42:25.042751Z", "iopub.status.busy": "2023-03-14T16:42:25.042549Z", "iopub.status.idle": "2023-03-14T16:42:25.054240Z", "shell.execute_reply": "2023-03-14T16:42:25.053609Z"}, "lines_to_next_cell": 2, "papermill": {"duration": 0.032418, "end_time": "2023-03-14T16:42:25.055828", "exception": false, "start_time": "2023-03-14T16:42:25.023410", "status": "completed"}, "tags": []}, "outputs": [], "source": ["class SimCLR(L.LightningModule):\n", " def __init__(self, hidden_dim, lr, temperature, weight_decay, max_epochs=500):\n", " super().__init__()\n", " self.save_hyperparameters()\n", " assert self.hparams.temperature > 0.0, \"The temperature must be a positive float!\"\n", " # Base model f(.)\n", " self.convnet = torchvision.models.resnet18(\n", " pretrained=False, num_classes=4 * hidden_dim\n", " ) # num_classes is the output size of the last linear layer\n", " # The MLP for g(.) consists of Linear->ReLU->Linear\n", " self.convnet.fc = nn.Sequential(\n", " self.convnet.fc, # Linear(ResNet output, 4*hidden_dim)\n", " nn.ReLU(inplace=True),\n", " nn.Linear(4 * hidden_dim, hidden_dim),\n", " )\n", "\n", " def configure_optimizers(self):\n", " optimizer = optim.AdamW(self.parameters(), lr=self.hparams.lr, weight_decay=self.hparams.weight_decay)\n", " lr_scheduler = optim.lr_scheduler.CosineAnnealingLR(\n", " optimizer, T_max=self.hparams.max_epochs, eta_min=self.hparams.lr / 50\n", " )\n", " return [optimizer], [lr_scheduler]\n", "\n", " def info_nce_loss(self, batch, mode=\"train\"):\n", " imgs, _ = batch\n", " imgs = torch.cat(imgs, dim=0)\n", "\n", " # Encode all images\n", " feats = self.convnet(imgs)\n", " # Calculate cosine similarity\n", " cos_sim = F.cosine_similarity(feats[:, None, :], feats[None, :, :], dim=-1)\n", " # Mask out cosine similarity to itself\n", " self_mask = torch.eye(cos_sim.shape[0], dtype=torch.bool, device=cos_sim.device)\n", " cos_sim.masked_fill_(self_mask, -9e15)\n", " # Find positive example -> batch_size//2 away from the original example\n", " pos_mask = self_mask.roll(shifts=cos_sim.shape[0] // 2, dims=0)\n", " # InfoNCE loss\n", " cos_sim = cos_sim / self.hparams.temperature\n", " nll = -cos_sim[pos_mask] + torch.logsumexp(cos_sim, dim=-1)\n", " nll = nll.mean()\n", "\n", " # Logging loss\n", " self.log(mode + \"_loss\", nll)\n", " # Get ranking position of positive example\n", " comb_sim = torch.cat(\n", " [cos_sim[pos_mask][:, None], cos_sim.masked_fill(pos_mask, -9e15)], # First position positive example\n", " dim=-1,\n", " )\n", " sim_argsort = comb_sim.argsort(dim=-1, descending=True).argmin(dim=-1)\n", " # Logging ranking metrics\n", " self.log(mode + \"_acc_top1\", (sim_argsort == 0).float().mean())\n", " self.log(mode + \"_acc_top5\", (sim_argsort < 5).float().mean())\n", " self.log(mode + \"_acc_mean_pos\", 1 + sim_argsort.float().mean())\n", "\n", " return nll\n", "\n", " def training_step(self, batch, batch_idx):\n", " return self.info_nce_loss(batch, mode=\"train\")\n", "\n", " def validation_step(self, batch, batch_idx):\n", " self.info_nce_loss(batch, mode=\"val\")"]}, {"cell_type": "markdown", "id": "64e1c06e", "metadata": {"papermill": {"duration": 0.017708, "end_time": "2023-03-14T16:42:25.098050", "exception": false, "start_time": "2023-03-14T16:42:25.080342", "status": "completed"}, "tags": []}, "source": ["Alternatively to performing the validation on the contrastive learning loss as well, we could also take a simple, small downstream task, and track the performance of the base network $f(\\cdot)$ on that.\n", "However, in this tutorial, we will restrict ourselves to the STL10\n", "dataset where we use the task of image classification on STL10 as our\n", "test task."]}, {"cell_type": "markdown", "id": "508060a8", "metadata": {"lines_to_next_cell": 2, "papermill": {"duration": 0.017351, "end_time": "2023-03-14T16:42:25.133553", "exception": false, "start_time": "2023-03-14T16:42:25.116202", "status": "completed"}, "tags": []}, "source": ["### Training\n", "\n", "Now that we have implemented SimCLR and the data loading pipeline, we are ready to train the model.\n", "We will use the same training function setup as usual.\n", "For saving the best model checkpoint, we track the metric `val_acc_top5`, which describes how often the correct image patch is within the top-5 most similar examples in the batch.\n", "This is usually less noisy than the top-1 metric, making it a better metric to choose the best model from."]}, {"cell_type": "code", "execution_count": 9, "id": "9929af38", "metadata": {"execution": {"iopub.execute_input": "2023-03-14T16:42:25.169518Z", "iopub.status.busy": "2023-03-14T16:42:25.169344Z", "iopub.status.idle": "2023-03-14T16:42:25.175814Z", "shell.execute_reply": "2023-03-14T16:42:25.175194Z"}, "papermill": {"duration": 0.026319, "end_time": "2023-03-14T16:42:25.177414", "exception": false, "start_time": "2023-03-14T16:42:25.151095", "status": "completed"}, "tags": []}, "outputs": [], "source": ["def train_simclr(batch_size, max_epochs=500, **kwargs):\n", " trainer = L.Trainer(\n", " default_root_dir=os.path.join(CHECKPOINT_PATH, \"SimCLR\"),\n", " accelerator=\"auto\",\n", " devices=1,\n", " max_epochs=max_epochs,\n", " callbacks=[\n", " ModelCheckpoint(save_weights_only=True, mode=\"max\", monitor=\"val_acc_top5\"),\n", " LearningRateMonitor(\"epoch\"),\n", " ],\n", " )\n", " trainer.logger._default_hp_metric = None # Optional logging argument that we don't need\n", "\n", " # Check whether pretrained model exists. If yes, load it and skip training\n", " pretrained_filename = os.path.join(CHECKPOINT_PATH, \"SimCLR.ckpt\")\n", " if os.path.isfile(pretrained_filename):\n", " print(f\"Found pretrained model at {pretrained_filename}, loading...\")\n", " # Automatically loads the model with the saved hyperparameters\n", " model = SimCLR.load_from_checkpoint(pretrained_filename)\n", " else:\n", " train_loader = data.DataLoader(\n", " unlabeled_data,\n", " batch_size=batch_size,\n", " shuffle=True,\n", " drop_last=True,\n", " pin_memory=True,\n", " num_workers=NUM_WORKERS,\n", " )\n", " val_loader = data.DataLoader(\n", " train_data_contrast,\n", " batch_size=batch_size,\n", " shuffle=False,\n", " drop_last=False,\n", " pin_memory=True,\n", " num_workers=NUM_WORKERS,\n", " )\n", " L.seed_everything(42) # To be reproducable\n", " model = SimCLR(max_epochs=max_epochs, **kwargs)\n", " trainer.fit(model, train_loader, val_loader)\n", " # Load best checkpoint after training\n", " model = SimCLR.load_from_checkpoint(trainer.checkpoint_callback.best_model_path)\n", "\n", " return model"]}, {"cell_type": "markdown", "id": "e5efffef", "metadata": {"papermill": {"duration": 0.017616, "end_time": "2023-03-14T16:42:25.216781", "exception": false, "start_time": "2023-03-14T16:42:25.199165", "status": "completed"}, "tags": []}, "source": ["A common observation in contrastive learning is that the larger the batch size, the better the models perform.\n", "A larger batch size allows us to compare each image to more negative examples, leading to overall smoother loss gradients.\n", "However, in our case, we experienced that a batch size of 256 was sufficient to get good results."]}, {"cell_type": "code", "execution_count": 10, "id": "b78fed97", "metadata": {"execution": {"iopub.execute_input": "2023-03-14T16:42:25.253145Z", "iopub.status.busy": "2023-03-14T16:42:25.252888Z", "iopub.status.idle": "2023-03-14T16:42:25.738069Z", "shell.execute_reply": "2023-03-14T16:42:25.737006Z"}, "papermill": {"duration": 0.506283, "end_time": "2023-03-14T16:42:25.740568", "exception": false, "start_time": "2023-03-14T16:42:25.234285", "status": "completed"}, "tags": []}, "outputs": [{"name": "stderr", "output_type": "stream", "text": ["GPU available: True (cuda), used: True\n"]}, {"name": "stderr", "output_type": "stream", "text": ["TPU available: False, using: 0 TPU cores\n"]}, {"name": "stderr", "output_type": "stream", "text": ["IPU available: False, using: 0 IPUs\n"]}, {"name": "stderr", "output_type": "stream", "text": ["HPU available: False, using: 0 HPUs\n"]}, {"name": "stderr", "output_type": "stream", "text": ["Lightning automatically upgraded your loaded checkpoint from v1.3.4 to v2.0.0rc0. To apply the upgrade to your files permanently, run `python -m lightning.pytorch.utilities.upgrade_checkpoint --file saved_models/ContrastiveLearning/SimCLR.ckpt`\n"]}, {"name": "stderr", "output_type": "stream", "text": ["/usr/local/lib/python3.9/dist-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.\n", " warnings.warn(\n", "/usr/local/lib/python3.9/dist-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=None`.\n", " warnings.warn(msg)\n"]}, {"name": "stdout", "output_type": "stream", "text": ["Found pretrained model at saved_models/ContrastiveLearning/SimCLR.ckpt, loading...\n"]}], "source": ["simclr_model = train_simclr(\n", " batch_size=256, hidden_dim=128, lr=5e-4, temperature=0.07, weight_decay=1e-4, max_epochs=500\n", ")"]}, {"cell_type": "markdown", "id": "bc4fe6ac", "metadata": {"papermill": {"duration": 0.02314, "end_time": "2023-03-14T16:42:25.787365", "exception": false, "start_time": "2023-03-14T16:42:25.764225", "status": "completed"}, "tags": []}, "source": ["To get an intuition of how training with contrastive learning behaves, we can take a look at the TensorBoard below:"]}, {"cell_type": "code", "execution_count": 11, "id": "49f850bb", "metadata": {"execution": {"iopub.execute_input": "2023-03-14T16:42:25.830375Z", "iopub.status.busy": "2023-03-14T16:42:25.830165Z", "iopub.status.idle": "2023-03-14T16:42:26.885423Z", "shell.execute_reply": "2023-03-14T16:42:26.884016Z"}, "papermill": {"duration": 1.078457, "end_time": "2023-03-14T16:42:26.888220", "exception": false, "start_time": "2023-03-14T16:42:25.809763", "status": "completed"}, "tags": []}, "outputs": [{"data": {"text/html": ["\n", " \n", " \n", " "], "text/plain": [""]}, "metadata": {}, "output_type": "display_data"}], "source": ["%tensorboard --logdir ../saved_models/tutorial17/tensorboards/SimCLR/"]}, {"cell_type": "markdown", "id": "1168210d", "metadata": {"papermill": {"duration": 0.019185, "end_time": "2023-03-14T16:42:26.929127", "exception": false, "start_time": "2023-03-14T16:42:26.909942", "status": "completed"}, "tags": []}, "source": ["
![tensorboard simclr](){width=\"1200px\"}
\n", "\n", "One thing to note is that contrastive learning benefits a lot from long training.\n", "The shown plot above is from a training that took approx.\n", "1 day on a NVIDIA TitanRTX.\n", "Training the model for even longer might reduce its loss further, but we did not experience any gains from it for the downstream task on image classification.\n", "In general, contrastive learning can also benefit from using larger models, if sufficient unlabeled data is available."]}, {"cell_type": "markdown", "id": "190a8531", "metadata": {"lines_to_next_cell": 2, "papermill": {"duration": 0.018009, "end_time": "2023-03-14T16:42:26.966962", "exception": false, "start_time": "2023-03-14T16:42:26.948953", "status": "completed"}, "tags": []}, "source": ["## Logistic Regression\n", "\n", "
\n", "After we have trained our model via contrastive learning, we can deploy it on downstream tasks and see how well it performs with little data.\n", "A common setup, which also verifies whether the model has learned generalized representations, is to perform Logistic Regression on the features.\n", "In other words, we learn a single, linear layer that maps the representations to a class prediction.\n", "Since the base network $f(\\cdot)$ is not changed during the training process, the model can only perform well if the representations of $h$ describe all features that might be necessary for the task.\n", "Further, we do not have to worry too much about overfitting since we have very few parameters that are trained.\n", "Hence, we might expect that the model can perform well even with very little data.\n", "\n", "First, let's implement a simple Logistic Regression setup for which we assume that the images already have been encoded in their feature vectors.\n", "If very little data is available, it might be beneficial to dynamically encode the images during training so that we can also apply data augmentations.\n", "However, the way we implement it here is much more efficient and can be trained within a few seconds.\n", "Further, using data augmentations did not show any significant gain in this simple setup."]}, {"cell_type": "code", "execution_count": 12, "id": "974f524c", "metadata": {"execution": {"iopub.execute_input": "2023-03-14T16:42:27.005553Z", "iopub.status.busy": "2023-03-14T16:42:27.005158Z", "iopub.status.idle": "2023-03-14T16:42:27.019689Z", "shell.execute_reply": "2023-03-14T16:42:27.018754Z"}, "papermill": {"duration": 0.036921, "end_time": "2023-03-14T16:42:27.021952", "exception": false, "start_time": "2023-03-14T16:42:26.985031", "status": "completed"}, "tags": []}, "outputs": [], "source": ["class LogisticRegression(L.LightningModule):\n", " def __init__(self, feature_dim, num_classes, lr, weight_decay, max_epochs=100):\n", " super().__init__()\n", " self.save_hyperparameters()\n", " # Mapping from representation h to classes\n", " self.model = nn.Linear(feature_dim, num_classes)\n", "\n", " def configure_optimizers(self):\n", " optimizer = optim.AdamW(self.parameters(), lr=self.hparams.lr, weight_decay=self.hparams.weight_decay)\n", " lr_scheduler = optim.lr_scheduler.MultiStepLR(\n", " optimizer, milestones=[int(self.hparams.max_epochs * 0.6), int(self.hparams.max_epochs * 0.8)], gamma=0.1\n", " )\n", " return [optimizer], [lr_scheduler]\n", "\n", " def _calculate_loss(self, batch, mode=\"train\"):\n", " feats, labels = batch\n", " preds = self.model(feats)\n", " loss = F.cross_entropy(preds, labels)\n", " acc = (preds.argmax(dim=-1) == labels).float().mean()\n", "\n", " self.log(mode + \"_loss\", loss)\n", " self.log(mode + \"_acc\", acc)\n", " return loss\n", "\n", " def training_step(self, batch, batch_idx):\n", " return self._calculate_loss(batch, mode=\"train\")\n", "\n", " def validation_step(self, batch, batch_idx):\n", " self._calculate_loss(batch, mode=\"val\")\n", "\n", " def test_step(self, batch, batch_idx):\n", " self._calculate_loss(batch, mode=\"test\")"]}, {"cell_type": "markdown", "id": "80ddca76", "metadata": {"papermill": {"duration": 0.018051, "end_time": "2023-03-14T16:42:27.063734", "exception": false, "start_time": "2023-03-14T16:42:27.045683", "status": "completed"}, "tags": []}, "source": ["The data we use is the training and test set of STL10.\n", "The training contains 500 images per class, while the test set has 800 images per class."]}, {"cell_type": "code", "execution_count": 13, "id": "56032a1c", "metadata": {"execution": {"iopub.execute_input": "2023-03-14T16:42:27.101435Z", "iopub.status.busy": "2023-03-14T16:42:27.101069Z", "iopub.status.idle": "2023-03-14T16:42:38.091509Z", "shell.execute_reply": "2023-03-14T16:42:38.089906Z"}, "papermill": {"duration": 11.012423, "end_time": "2023-03-14T16:42:38.094187", "exception": false, "start_time": "2023-03-14T16:42:27.081764", "status": "completed"}, "tags": []}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["Files already downloaded and verified\n"]}, {"name": "stdout", "output_type": "stream", "text": ["Files already downloaded and verified\n", "Number of training examples: 5000\n", "Number of test examples: 8000\n"]}], "source": ["img_transforms = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])\n", "\n", "train_img_data = STL10(root=DATASET_PATH, split=\"train\", download=True, transform=img_transforms)\n", "test_img_data = STL10(root=DATASET_PATH, split=\"test\", download=True, transform=img_transforms)\n", "\n", "print(\"Number of training examples:\", len(train_img_data))\n", "print(\"Number of test examples:\", len(test_img_data))"]}, {"cell_type": "markdown", "id": "8fc7413f", "metadata": {"lines_to_next_cell": 2, "papermill": {"duration": 0.025216, "end_time": "2023-03-14T16:42:38.141873", "exception": false, "start_time": "2023-03-14T16:42:38.116657", "status": "completed"}, "tags": []}, "source": ["Next, we implement a small function to encode all images in our datasets.\n", "The output representations are then used as inputs to the Logistic Regression model."]}, {"cell_type": "code", "execution_count": 14, "id": "0418ff89", "metadata": {"execution": {"iopub.execute_input": "2023-03-14T16:42:38.181580Z", "iopub.status.busy": "2023-03-14T16:42:38.181056Z", "iopub.status.idle": "2023-03-14T16:42:38.194566Z", "shell.execute_reply": "2023-03-14T16:42:38.193477Z"}, "papermill": {"duration": 0.036857, "end_time": "2023-03-14T16:42:38.197050", "exception": false, "start_time": "2023-03-14T16:42:38.160193", "status": "completed"}, "tags": []}, "outputs": [], "source": ["@torch.no_grad()\n", "def prepare_data_features(model, dataset):\n", " # Prepare model\n", " network = deepcopy(model.convnet)\n", " network.fc = nn.Identity() # Removing projection head g(.)\n", " network.eval()\n", " network.to(device)\n", "\n", " # Encode all images\n", " data_loader = data.DataLoader(dataset, batch_size=64, num_workers=NUM_WORKERS, shuffle=False, drop_last=False)\n", " feats, labels = [], []\n", " for batch_imgs, batch_labels in tqdm(data_loader):\n", " batch_imgs = batch_imgs.to(device)\n", " batch_feats = network(batch_imgs)\n", " feats.append(batch_feats.detach().cpu())\n", " labels.append(batch_labels)\n", "\n", " feats = torch.cat(feats, dim=0)\n", " labels = torch.cat(labels, dim=0)\n", "\n", " # Sort images by labels\n", " labels, idxs = labels.sort()\n", " feats = feats[idxs]\n", "\n", " return data.TensorDataset(feats, labels)"]}, {"cell_type": "markdown", "id": "09920606", "metadata": {"papermill": {"duration": 0.018256, "end_time": "2023-03-14T16:42:38.240971", "exception": false, "start_time": "2023-03-14T16:42:38.222715", "status": "completed"}, "tags": []}, "source": ["Let's apply the function to both training and test set below."]}, {"cell_type": "code", "execution_count": 15, "id": "47072448", "metadata": {"execution": {"iopub.execute_input": "2023-03-14T16:42:38.279007Z", "iopub.status.busy": "2023-03-14T16:42:38.278644Z", "iopub.status.idle": "2023-03-14T16:42:56.770916Z", "shell.execute_reply": "2023-03-14T16:42:56.769573Z"}, "papermill": {"duration": 18.513942, "end_time": "2023-03-14T16:42:56.773149", "exception": false, "start_time": "2023-03-14T16:42:38.259207", "status": "completed"}, "tags": []}, "outputs": [{"data": {"application/vnd.jupyter.widget-view+json": {"model_id": "5fde1f2547174b81809ba1ebea56b1ef", "version_major": 2, "version_minor": 0}, "text/plain": [" 0%| | 0/79 [00:00\n", "\n", "\n", " \n", " \n", " \n", " \n", " 2023-03-14T16:42:58.905703\n", " image/svg+xml\n", " \n", " \n", " Matplotlib v3.7.1, https://matplotlib.org/\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n"], "text/plain": ["
"]}, "metadata": {}, "output_type": "display_data"}, {"name": "stdout", "output_type": "stream", "text": ["Test accuracy for 10 images per label: 62.81%\n", "Test accuracy for 20 images per label: 68.60%\n", "Test accuracy for 50 images per label: 74.44%\n", "Test accuracy for 100 images per label: 77.20%\n", "Test accuracy for 200 images per label: 79.06%\n", "Test accuracy for 500 images per label: 81.33%\n"]}], "source": ["dataset_sizes = sorted(k for k in results)\n", "test_scores = [results[k][\"test\"] for k in dataset_sizes]\n", "\n", "fig = plt.figure(figsize=(6, 4))\n", "plt.plot(\n", " dataset_sizes,\n", " test_scores,\n", " \"--\",\n", " color=\"#000\",\n", " marker=\"*\",\n", " markeredgecolor=\"#000\",\n", " markerfacecolor=\"y\",\n", " markersize=16,\n", ")\n", "plt.xscale(\"log\")\n", "plt.xticks(dataset_sizes, labels=dataset_sizes)\n", "plt.title(\"STL10 classification over dataset size\", fontsize=14)\n", "plt.xlabel(\"Number of images per class\")\n", "plt.ylabel(\"Test accuracy\")\n", "plt.minorticks_off()\n", "plt.show()\n", "\n", "for k, score in zip(dataset_sizes, test_scores):\n", " print(f\"Test accuracy for {k:3d} images per label: {100*score:4.2f}%\")"]}, {"cell_type": "markdown", "id": "1ff2c8cb", "metadata": {"papermill": {"duration": 0.022404, "end_time": "2023-03-14T16:42:59.153434", "exception": false, "start_time": "2023-03-14T16:42:59.131030", "status": "completed"}, "tags": []}, "source": ["As one would expect, the classification performance improves the more data we have.\n", "However, with only 10 images per class, we can already classify more than 60% of the images correctly.\n", "This is quite impressive, considering that the images are also higher dimensional than e.g. CIFAR10.\n", "With the full dataset, we achieve an accuracy of 81%.\n", "The increase between 50 to 500 images per class might suggest a linear increase in performance with an exponentially larger dataset.\n", "However, with even more data, we could also finetune $f(\\cdot)$ in the training process, allowing for the representations to adapt more to the specific classification task given.\n", "\n", "To set the results above into perspective, we will train the base\n", "network, a ResNet-18, on the classification task from scratch."]}, {"cell_type": "markdown", "id": "16755fba", "metadata": {"lines_to_next_cell": 2, "papermill": {"duration": 0.02258, "end_time": "2023-03-14T16:42:59.198796", "exception": false, "start_time": "2023-03-14T16:42:59.176216", "status": "completed"}, "tags": []}, "source": ["## Baseline\n", "\n", "As a baseline to our results above, we will train a standard ResNet-18 with random initialization on the labeled training set of STL10.\n", "The results will give us an indication of the advantages that contrastive learning on unlabeled data has compared to using only supervised training.\n", "The implementation of the model is straightforward since the ResNet\n", "architecture is provided in the torchvision library."]}, {"cell_type": "code", "execution_count": 20, "id": "806d2b7f", "metadata": {"execution": {"iopub.execute_input": "2023-03-14T16:42:59.246398Z", "iopub.status.busy": "2023-03-14T16:42:59.245515Z", "iopub.status.idle": "2023-03-14T16:42:59.258825Z", "shell.execute_reply": "2023-03-14T16:42:59.258171Z"}, "papermill": {"duration": 0.039655, "end_time": "2023-03-14T16:42:59.261006", "exception": false, "start_time": "2023-03-14T16:42:59.221351", "status": "completed"}, "tags": []}, "outputs": [], "source": ["class ResNet(L.LightningModule):\n", " def __init__(self, num_classes, lr, weight_decay, max_epochs=100):\n", " super().__init__()\n", " self.save_hyperparameters()\n", " self.model = torchvision.models.resnet18(pretrained=False, num_classes=num_classes)\n", "\n", " def configure_optimizers(self):\n", " optimizer = optim.AdamW(self.parameters(), lr=self.hparams.lr, weight_decay=self.hparams.weight_decay)\n", " lr_scheduler = optim.lr_scheduler.MultiStepLR(\n", " optimizer, milestones=[int(self.hparams.max_epochs * 0.7), int(self.hparams.max_epochs * 0.9)], gamma=0.1\n", " )\n", " return [optimizer], [lr_scheduler]\n", "\n", " def _calculate_loss(self, batch, mode=\"train\"):\n", " imgs, labels = batch\n", " preds = self.model(imgs)\n", " loss = F.cross_entropy(preds, labels)\n", " acc = (preds.argmax(dim=-1) == labels).float().mean()\n", "\n", " self.log(mode + \"_loss\", loss)\n", " self.log(mode + \"_acc\", acc)\n", " return loss\n", "\n", " def training_step(self, batch, batch_idx):\n", " return self._calculate_loss(batch, mode=\"train\")\n", "\n", " def validation_step(self, batch, batch_idx):\n", " self._calculate_loss(batch, mode=\"val\")\n", "\n", " def test_step(self, batch, batch_idx):\n", " self._calculate_loss(batch, mode=\"test\")"]}, {"cell_type": "markdown", "id": "d61e37bb", "metadata": {"papermill": {"duration": 0.022644, "end_time": "2023-03-14T16:42:59.312531", "exception": false, "start_time": "2023-03-14T16:42:59.289887", "status": "completed"}, "tags": []}, "source": ["It is clear that the ResNet easily overfits on the training data since its parameter count is more than 1000 times larger than the dataset size.\n", "To make the comparison to the contrastive learning models fair, we apply data augmentations similar to the ones we used before: horizontal flip, crop-and-resize, grayscale, and gaussian blur.\n", "Color distortions as before are not used because the color distribution of an image showed to be an important feature for the classification.\n", "Hence, we observed no noticeable performance gains when adding color distortions to the set of augmentations.\n", "Similarly, we restrict the resizing operation before cropping to the max.\n", "125% of its original resolution, instead of 1250% as done in SimCLR.\n", "This is because, for classification, the model needs to recognize the full object, while in contrastive learning, we only want to check whether two patches belong to the same image/object.\n", "Hence, the chosen augmentations below are overall weaker than in the contrastive learning case."]}, {"cell_type": "code", "execution_count": 21, "id": "629179be", "metadata": {"execution": {"iopub.execute_input": "2023-03-14T16:42:59.359972Z", "iopub.status.busy": "2023-03-14T16:42:59.359482Z", "iopub.status.idle": "2023-03-14T16:43:05.096021Z", "shell.execute_reply": "2023-03-14T16:43:05.094288Z"}, "papermill": {"duration": 5.763617, "end_time": "2023-03-14T16:43:05.098810", "exception": false, "start_time": "2023-03-14T16:42:59.335193", "status": "completed"}, "tags": []}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["Files already downloaded and verified\n"]}], "source": ["train_transforms = transforms.Compose(\n", " [\n", " transforms.RandomHorizontalFlip(),\n", " transforms.RandomResizedCrop(size=96, scale=(0.8, 1.0)),\n", " transforms.RandomGrayscale(p=0.2),\n", " transforms.GaussianBlur(kernel_size=9, sigma=(0.1, 0.5)),\n", " transforms.ToTensor(),\n", " transforms.Normalize((0.5,), (0.5,)),\n", " ]\n", ")\n", "\n", "train_img_aug_data = STL10(root=DATASET_PATH, split=\"train\", download=True, transform=train_transforms)"]}, {"cell_type": "markdown", "id": "d49ad523", "metadata": {"lines_to_next_cell": 2, "papermill": {"duration": 0.02242, "end_time": "2023-03-14T16:43:05.147454", "exception": false, "start_time": "2023-03-14T16:43:05.125034", "status": "completed"}, "tags": []}, "source": ["The training function for the ResNet is almost identical to the Logistic Regression setup.\n", "Note that we allow the ResNet to perform validation every 2 epochs to\n", "also check whether the model overfits strongly in the first iterations\n", "or not."]}, {"cell_type": "code", "execution_count": 22, "id": "8975379a", "metadata": {"execution": {"iopub.execute_input": "2023-03-14T16:43:05.195976Z", "iopub.status.busy": "2023-03-14T16:43:05.195439Z", "iopub.status.idle": "2023-03-14T16:43:05.213417Z", "shell.execute_reply": "2023-03-14T16:43:05.212593Z"}, "papermill": {"duration": 0.045535, "end_time": "2023-03-14T16:43:05.215628", "exception": false, "start_time": "2023-03-14T16:43:05.170093", "status": "completed"}, "tags": []}, "outputs": [], "source": ["def train_resnet(batch_size, max_epochs=100, **kwargs):\n", " trainer = L.Trainer(\n", " default_root_dir=os.path.join(CHECKPOINT_PATH, \"ResNet\"),\n", " accelerator=\"auto\",\n", " devices=1,\n", " max_epochs=max_epochs,\n", " callbacks=[\n", " ModelCheckpoint(save_weights_only=True, mode=\"max\", monitor=\"val_acc\"),\n", " LearningRateMonitor(\"epoch\"),\n", " ],\n", " check_val_every_n_epoch=2,\n", " )\n", " trainer.logger._default_hp_metric = None\n", "\n", " # Data loaders\n", " train_loader = data.DataLoader(\n", " train_img_aug_data,\n", " batch_size=batch_size,\n", " shuffle=True,\n", " drop_last=True,\n", " pin_memory=True,\n", " num_workers=NUM_WORKERS,\n", " )\n", " test_loader = data.DataLoader(\n", " test_img_data, batch_size=batch_size, shuffle=False, drop_last=False, pin_memory=True, num_workers=NUM_WORKERS\n", " )\n", "\n", " # Check whether pretrained model exists. If yes, load it and skip training\n", " pretrained_filename = os.path.join(CHECKPOINT_PATH, \"ResNet.ckpt\")\n", " if os.path.isfile(pretrained_filename):\n", " print(\"Found pretrained model at %s, loading...\" % pretrained_filename)\n", " model = ResNet.load_from_checkpoint(pretrained_filename)\n", " else:\n", " L.seed_everything(42) # To be reproducable\n", " model = ResNet(**kwargs)\n", " trainer.fit(model, train_loader, test_loader)\n", " model = ResNet.load_from_checkpoint(trainer.checkpoint_callback.best_model_path)\n", "\n", " # Test best model on validation set\n", " train_result = trainer.test(model, dataloaders=train_loader, verbose=False)\n", " val_result = trainer.test(model, dataloaders=test_loader, verbose=False)\n", " result = {\"train\": train_result[0][\"test_acc\"], \"test\": val_result[0][\"test_acc\"]}\n", "\n", " return model, result"]}, {"cell_type": "markdown", "id": "f3f9b520", "metadata": {"papermill": {"duration": 0.023048, "end_time": "2023-03-14T16:43:05.265015", "exception": false, "start_time": "2023-03-14T16:43:05.241967", "status": "completed"}, "tags": []}, "source": ["Finally, let's train the model and check its results:"]}, {"cell_type": "code", "execution_count": 23, "id": "74eb06f0", "metadata": {"execution": {"iopub.execute_input": "2023-03-14T16:43:05.311708Z", "iopub.status.busy": "2023-03-14T16:43:05.311345Z", "iopub.status.idle": "2023-03-14T16:43:24.823549Z", "shell.execute_reply": "2023-03-14T16:43:24.822126Z"}, "papermill": {"duration": 19.538827, "end_time": "2023-03-14T16:43:24.826242", "exception": false, "start_time": "2023-03-14T16:43:05.287415", "status": "completed"}, "tags": []}, "outputs": [{"name": "stderr", "output_type": "stream", "text": ["GPU available: True (cuda), used: True\n"]}, {"name": "stderr", "output_type": "stream", "text": ["TPU available: False, using: 0 TPU cores\n"]}, {"name": "stderr", "output_type": "stream", "text": ["IPU available: False, using: 0 IPUs\n"]}, {"name": "stderr", "output_type": "stream", "text": ["HPU available: False, using: 0 HPUs\n"]}, {"name": "stderr", "output_type": "stream", "text": ["Lightning automatically upgraded your loaded checkpoint from v1.3.4 to v2.0.0rc0. To apply the upgrade to your files permanently, run `python -m lightning.pytorch.utilities.upgrade_checkpoint --file saved_models/ContrastiveLearning/ResNet.ckpt`\n"]}, {"name": "stderr", "output_type": "stream", "text": ["/usr/local/lib/python3.9/dist-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.\n", " warnings.warn(\n", "/usr/local/lib/python3.9/dist-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=None`.\n", " warnings.warn(msg)\n"]}, {"name": "stdout", "output_type": "stream", "text": ["Found pretrained model at saved_models/ContrastiveLearning/ResNet.ckpt, loading...\n"]}, {"name": "stderr", "output_type": "stream", "text": ["You are using a CUDA device ('NVIDIA GeForce RTX 3090') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision\n"]}, {"name": "stderr", "output_type": "stream", "text": ["Missing logger folder: saved_models/ContrastiveLearning/ResNet/lightning_logs\n"]}, {"name": "stderr", "output_type": "stream", "text": ["LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [4,5]\n"]}, {"name": "stderr", "output_type": "stream", "text": ["/usr/local/lib/python3.9/dist-packages/lightning/pytorch/trainer/connectors/data_connector.py:326: PossibleUserWarning: Your `test_dataloader`'s sampler has shuffling enabled, it is strongly recommended that you turn shuffling off for val/test/predict dataloaders.\n", " rank_zero_warn(\n"]}, {"data": {"application/vnd.jupyter.widget-view+json": {"model_id": "3f95f7919038413281002363097395d0", "version_major": 2, "version_minor": 0}, "text/plain": ["Testing: 0it [00:00, ?it/s]"]}, "metadata": {}, "output_type": "display_data"}, {"name": "stderr", "output_type": "stream", "text": ["You are using a CUDA device ('NVIDIA GeForce RTX 3090') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision\n"]}, {"name": "stderr", "output_type": "stream", "text": ["LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [4,5]\n"]}, {"data": {"application/vnd.jupyter.widget-view+json": {"model_id": "0fab58f6815044039651fe8ed64636ad", "version_major": 2, "version_minor": 0}, "text/plain": ["Testing: 0it [00:00, ?it/s]"]}, "metadata": {}, "output_type": "display_data"}, {"name": "stdout", "output_type": "stream", "text": ["Accuracy on training set: 99.66%\n", "Accuracy on test set: 73.31%\n"]}], "source": ["resnet_model, resnet_result = train_resnet(batch_size=64, num_classes=10, lr=1e-3, weight_decay=2e-4, max_epochs=100)\n", "print(f\"Accuracy on training set: {100*resnet_result['train']:4.2f}%\")\n", "print(f\"Accuracy on test set: {100*resnet_result['test']:4.2f}%\")"]}, {"cell_type": "markdown", "id": "8057670e", "metadata": {"papermill": {"duration": 0.030522, "end_time": "2023-03-14T16:43:24.887312", "exception": false, "start_time": "2023-03-14T16:43:24.856790", "status": "completed"}, "tags": []}, "source": ["The ResNet trained from scratch achieves 73.31% on the test set.\n", "This is almost 8% less than the contrastive learning model, and even slightly less than SimCLR achieves with 1/10 of the data.\n", "This shows that self-supervised, contrastive learning provides\n", "considerable performance gains by leveraging large amounts of unlabeled\n", "data when little labeled data is available."]}, {"cell_type": "markdown", "id": "ea731460", "metadata": {"papermill": {"duration": 0.02327, "end_time": "2023-03-14T16:43:24.933968", "exception": false, "start_time": "2023-03-14T16:43:24.910698", "status": "completed"}, "tags": []}, "source": ["## Conclusion\n", "\n", "In this tutorial, we have discussed self-supervised contrastive learning and implemented SimCLR as an example method.\n", "We have applied it to the STL10 dataset and showed that it can learn generalizable representations that we can use to train simple classification models.\n", "With 500 images per label, it achieved an 8% higher accuracy than a similar model solely trained from supervision and performs on par with it when only using a tenth of the labeled data.\n", "Our experimental results are limited to a single dataset, but recent works such as [Ting Chen et al. ](https://arxiv.org/abs/2006.10029) showed similar trends for larger datasets like ImageNet.\n", "Besides the discussed hyperparameters, the size of the model seems to be important in contrastive learning as well.\n", "If a lot of unlabeled data is available, larger models can achieve much stronger results and come close to their supervised baselines.\n", "Further, there are also approaches for combining contrastive and supervised learning, leading to performance gains beyond supervision (see [Khosla et al.](https://arxiv.org/abs/2004.11362)).\n", "Moreover, contrastive learning is not the only approach to self-supervised learning that has come up in the last two years and showed great results.\n", "Other methods include distillation-based methods like [BYOL](https://arxiv.org/abs/2006.07733) and redundancy reduction techniques like [Barlow Twins](https://arxiv.org/abs/2103.03230).\n", "There is a lot more to explore in the self-supervised domain, and more, impressive steps ahead are to be expected.\n", "\n", "### References\n", "\n", "[1] Chen, T., Kornblith, S., Norouzi, M., and Hinton, G. (2020).\n", "A simple framework for contrastive learning of visual representations.\n", "In International conference on machine learning (pp.\n", "1597-1607).\n", "PMLR.\n", "([link](https://arxiv.org/abs/2002.05709))\n", "\n", "[2] Chen, T., Kornblith, S., Swersky, K., Norouzi, M., and Hinton, G. (2020).\n", "Big self-supervised models are strong semi-supervised learners.\n", "NeurIPS 2021 ([link](https://arxiv.org/abs/2006.10029)).\n", "\n", "[3] Oord, A. V. D., Li, Y., and Vinyals, O.\n", "(2018).\n", "Representation learning with contrastive predictive coding.\n", "arXiv preprint arXiv:1807.03748.\n", "([link](https://arxiv.org/abs/1807.03748))\n", "\n", "[4] Grill, J.B., Strub, F., Altch\u00e9, F., Tallec, C., Richemond, P.H., Buchatskaya, E., Doersch, C., Pires, B.A., Guo, Z.D., Azar, M.G.\n", "and Piot, B.\n", "(2020).\n", "Bootstrap your own latent: A new approach to self-supervised learning.\n", "arXiv preprint arXiv:2006.07733.\n", "([link](https://arxiv.org/abs/2006.07733))\n", "\n", "[5] Khosla, P., Teterwak, P., Wang, C., Sarna, A., Tian, Y., Isola, P., Maschinot, A., Liu, C. and Krishnan, D. (2020).\n", "Supervised contrastive learning.\n", "arXiv preprint arXiv:2004.11362.\n", "([link](https://arxiv.org/abs/2004.11362))\n", "\n", "[6] Zbontar, J., Jing, L., Misra, I., LeCun, Y. and Deny, S. (2021).\n", "Barlow twins: Self-supervised learning via redundancy reduction.\n", "arXiv preprint arXiv:2103.03230.\n", "([link](https://arxiv.org/abs/2103.03230))"]}, {"cell_type": "markdown", "id": "3b47272b", "metadata": {"papermill": {"duration": 0.023019, "end_time": "2023-03-14T16:43:24.980360", "exception": false, "start_time": "2023-03-14T16:43:24.957341", "status": "completed"}, "tags": []}, "source": ["## Congratulations - Time to Join the Community!\n", "\n", "Congratulations on completing this notebook tutorial! If you enjoyed this and would like to join the Lightning\n", "movement, you can do so in the following ways!\n", "\n", "### Star [Lightning](https://github.com/Lightning-AI/lightning) on GitHub\n", "The easiest way to help our community is just by starring the GitHub repos! This helps raise awareness of the cool\n", "tools we're building.\n", "\n", "### Join our [Slack](https://www.pytorchlightning.ai/community)!\n", "The best way to keep up to date on the latest advancements is to join our community! Make sure to introduce yourself\n", "and share your interests in `#general` channel\n", "\n", "\n", "### Contributions !\n", "The best way to contribute to our community is to become a code contributor! At any time you can go to\n", "[Lightning](https://github.com/Lightning-AI/lightning) or [Bolt](https://github.com/Lightning-AI/lightning-bolts)\n", "GitHub Issues page and filter for \"good first issue\".\n", "\n", "* [Lightning good first issue](https://github.com/Lightning-AI/lightning/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22)\n", "* [Bolt good first issue](https://github.com/Lightning-AI/lightning-bolts/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22)\n", "* You can also contribute your own notebooks with useful examples !\n", "\n", "### Great thanks from the entire Pytorch Lightning Team for your interest !\n", "\n", "[![Pytorch Lightning](){height=\"60px\" width=\"240px\"}](https://pytorchlightning.ai)"]}, {"cell_type": "raw", "metadata": {"raw_mimetype": "text/restructuredtext"}, "source": [".. customcarditem::\n", " :header: Tutorial 13: Self-Supervised Contrastive Learning with SimCLR\n", " :card_description: In this tutorial, we will take a closer look at self-supervised contrastive learning. Self-supervised learning, or also sometimes called unsupervised learning, describes the...\n", " :tags: Image,Self-Supervised,Contrastive-Learning,GPU/TPU,UvA-DL-Course\n", " :image: _static/images/course_UvA-DL/13-contrastive-learning.jpg"]}], "metadata": {"jupytext": {"cell_metadata_filter": "colab_type,colab,id,-all", "formats": "ipynb,py:percent", "main_language": "python"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.16"}, "papermill": {"default_parameters": {}, "duration": 805.936782, "end_time": "2023-03-14T16:43:28.131656", "environment_variables": {}, "exception": null, "input_path": "course_UvA-DL/13-contrastive-learning/SimCLR.ipynb", "output_path": ".notebooks/course_UvA-DL/13-contrastive-learning.ipynb", "parameters": {}, "start_time": "2023-03-14T16:30:02.194874", "version": "2.4.0"}, "widgets": {"application/vnd.jupyter.widget-state+json": {"state": {"0fab58f6815044039651fe8ed64636ad": {"model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HBoxModel", "state": {"_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "HBoxModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "2.0.0", "_view_name": "HBoxView", "box_style": "", "children": ["IPY_MODEL_4755d2c015834fe28791780e490312ee", "IPY_MODEL_d904204946034b379397dbfcc09bc690", "IPY_MODEL_b1c705ad63734ce08156896b370b7c9f"], "layout": "IPY_MODEL_725330fee67b40ecb3ba8dd01908d8aa", "tabbable": null, "tooltip": null}}, "10af4d63e08e45989276e2b09ebc4202": {"model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLStyleModel", "state": {"_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "HTMLStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "StyleView", "background": null, "description_width": "", "font_size": null, "text_color": null}}, "122ba60033424a579185ffd1d17cb947": {"model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLStyleModel", "state": {"_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "HTMLStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "StyleView", "background": null, "description_width": "", "font_size": null, "text_color": null}}, "227ba7baedec46ca8bb45f2e9e519722": {"model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLStyleModel", "state": {"_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "HTMLStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "StyleView", "background": null, "description_width": "", "font_size": null, "text_color": null}}, "25164819ad52452a9b679bcf291ad971": {"model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": {"_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null}}, "27587511bef04ec48d50d235ada8c68a": {"model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": {"_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null}}, "298ff6b37e804133948243fb1eee0bd1": {"model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "ProgressStyleModel", "state": {"_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "ProgressStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "StyleView", "bar_color": null, "description_width": ""}}, "2bb0e7aa943a41668de69231587362da": {"model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": {"_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null}}, "3b829a87d73d4e5581b67c3026a9f7bd": {"model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": {"_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null}}, "3f95f7919038413281002363097395d0": {"model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HBoxModel", "state": {"_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "HBoxModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "2.0.0", "_view_name": "HBoxView", "box_style": "", "children": ["IPY_MODEL_93a0cc180d9f40fe8b9885a889f6ec92", "IPY_MODEL_f6f0c0e738294bb08173e37f9776096a", "IPY_MODEL_6ca501f7de10451faad6016ccdf6b559"], "layout": "IPY_MODEL_7069ed6298484ff3830b418ea091e82c", "tabbable": null, "tooltip": null}}, "4755d2c015834fe28791780e490312ee": {"model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLModel", "state": {"_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "2.0.0", "_view_name": "HTMLView", "description": "", "description_allow_html": false, "layout": "IPY_MODEL_645972547d9144ca8f80470c6664a2da", "placeholder": "\u200b", "style": "IPY_MODEL_65a2f2a08f9c4c84beab579be11d9c42", "tabbable": null, "tooltip": null, "value": "Testing DataLoader 0: 100%"}}, "4a03a2fa51e74b9692b01d965566f34c": {"model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": {"_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null}}, "4bd13820b08f43b591a55ad7150043f0": {"model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLModel", "state": {"_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "2.0.0", "_view_name": "HTMLView", "description": "", "description_allow_html": false, "layout": "IPY_MODEL_27587511bef04ec48d50d235ada8c68a", "placeholder": "\u200b", "style": "IPY_MODEL_8e232c047ded4fcbb016dbf66698ae4a", "tabbable": null, "tooltip": null, "value": " 125/125 [00:10<00:00, 60.53it/s]"}}, "599d959c7ae3456c8d6ccf9c33781148": {"model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLStyleModel", "state": {"_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "HTMLStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "StyleView", "background": null, "description_width": "", "font_size": null, "text_color": null}}, "5fde1f2547174b81809ba1ebea56b1ef": {"model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HBoxModel", "state": {"_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "HBoxModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "2.0.0", "_view_name": "HBoxView", "box_style": "", "children": ["IPY_MODEL_7f339bec549249ecac9af0e6f2b08906", "IPY_MODEL_b64f1c42cf524aa5906e9c8d6dc613f2", "IPY_MODEL_76182af8ef104f8982e989cb8324b2f7"], "layout": "IPY_MODEL_2bb0e7aa943a41668de69231587362da", "tabbable": null, "tooltip": null}}, "645972547d9144ca8f80470c6664a2da": {"model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": {"_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null}}, "65a2f2a08f9c4c84beab579be11d9c42": {"model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLStyleModel", "state": {"_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "HTMLStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "StyleView", "background": null, "description_width": "", "font_size": null, "text_color": null}}, "6c214464531a48c38a4cf8c9e528cb59": {"model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HBoxModel", "state": {"_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "HBoxModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "2.0.0", "_view_name": "HBoxView", "box_style": "", "children": ["IPY_MODEL_ed85c1662c154086a689c53f8e58bb1e", "IPY_MODEL_bd19de7d60fc4b6f9e5916b4fdde7cc6", "IPY_MODEL_748facf2961c4e3d9a529414aca4508a"], "layout": "IPY_MODEL_c3ab7d9ae291409c822feadccf3e1a3c", "tabbable": null, "tooltip": null}}, "6ca501f7de10451faad6016ccdf6b559": {"model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLModel", "state": {"_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "2.0.0", "_view_name": "HTMLView", "description": "", "description_allow_html": false, "layout": "IPY_MODEL_4a03a2fa51e74b9692b01d965566f34c", "placeholder": "\u200b", "style": "IPY_MODEL_ea912d022774434ba7706f16f2bfd94f", "tabbable": null, "tooltip": null, "value": " 78/78 [00:00<00:00, 151.44it/s]"}}, "6cf27fb420f04d648a59dca934c9db7b": {"model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": {"_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null}}, "6edc9483a5de46dc92d959f9ee370f6f": {"model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": {"_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null}}, "6f867ae7b37a47b8b9d74187228dafd3": {"model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "ProgressStyleModel", "state": {"_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "ProgressStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "StyleView", "bar_color": null, "description_width": ""}}, "7069ed6298484ff3830b418ea091e82c": {"model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": {"_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": "inline-flex", "flex": null, "flex_flow": "row wrap", "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": "100%"}}, "7128571c8817401caf2bc0413a6c136f": {"model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": {"_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null}}, "725330fee67b40ecb3ba8dd01908d8aa": {"model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": {"_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": "inline-flex", "flex": null, "flex_flow": "row wrap", "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": "100%"}}, "74434fbd53cd4abbacbf45df96bf7623": {"model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": {"_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null}}, "748facf2961c4e3d9a529414aca4508a": {"model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLModel", "state": {"_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "2.0.0", "_view_name": "HTMLView", "description": "", "description_allow_html": false, "layout": "IPY_MODEL_9b80c7bef15742419a194a2ebd9be187", "placeholder": "\u200b", "style": "IPY_MODEL_a345d4f627b647a89af15c3f439b681d", "tabbable": null, "tooltip": null, "value": " 2640397119/2640397119 [11:28<00:00, 2472513.66it/s]"}}, "76182af8ef104f8982e989cb8324b2f7": {"model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLModel", "state": {"_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "2.0.0", "_view_name": "HTMLView", "description": "", "description_allow_html": false, "layout": "IPY_MODEL_25164819ad52452a9b679bcf291ad971", "placeholder": "\u200b", "style": "IPY_MODEL_f4bdef3d30da43ca85d8c6ae3344d8c4", "tabbable": null, "tooltip": null, "value": " 79/79 [00:07<00:00, 30.74it/s]"}}, "7f339bec549249ecac9af0e6f2b08906": {"model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLModel", "state": {"_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "2.0.0", "_view_name": "HTMLView", "description": "", "description_allow_html": false, "layout": "IPY_MODEL_74434fbd53cd4abbacbf45df96bf7623", "placeholder": "\u200b", "style": "IPY_MODEL_b026e0213d90485c9a87bc3040b459f3", "tabbable": null, "tooltip": null, "value": "100%"}}, "8e232c047ded4fcbb016dbf66698ae4a": {"model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLStyleModel", "state": {"_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "HTMLStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "StyleView", "background": null, "description_width": "", "font_size": null, "text_color": null}}, "93a0cc180d9f40fe8b9885a889f6ec92": {"model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLModel", "state": {"_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "2.0.0", "_view_name": "HTMLView", "description": "", "description_allow_html": false, "layout": "IPY_MODEL_6cf27fb420f04d648a59dca934c9db7b", "placeholder": "\u200b", "style": "IPY_MODEL_227ba7baedec46ca8bb45f2e9e519722", "tabbable": null, "tooltip": null, "value": "Testing DataLoader 0: 100%"}}, "94cb436019d04ed9bedda7e2ed984e70": {"model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "ProgressStyleModel", "state": {"_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "ProgressStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "StyleView", "bar_color": null, "description_width": ""}}, "9b80c7bef15742419a194a2ebd9be187": {"model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": {"_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null}}, "a345d4f627b647a89af15c3f439b681d": {"model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLStyleModel", "state": {"_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "HTMLStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "StyleView", "background": null, "description_width": "", "font_size": null, "text_color": null}}, "afa94de1b2fb403a9ed129ace655eaed": {"model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "ProgressStyleModel", "state": {"_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "ProgressStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "StyleView", "bar_color": null, "description_width": ""}}, "b026e0213d90485c9a87bc3040b459f3": {"model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLStyleModel", "state": {"_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "HTMLStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "StyleView", "background": null, "description_width": "", "font_size": null, "text_color": null}}, "b1c705ad63734ce08156896b370b7c9f": {"model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLModel", "state": {"_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "2.0.0", "_view_name": "HTMLView", "description": "", "description_allow_html": false, "layout": "IPY_MODEL_fdf85f79faef4cafa2aa8c94157183f0", "placeholder": "\u200b", "style": "IPY_MODEL_10af4d63e08e45989276e2b09ebc4202", "tabbable": null, "tooltip": null, "value": " 125/125 [00:00<00:00, 140.30it/s]"}}, "b64f1c42cf524aa5906e9c8d6dc613f2": {"model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "FloatProgressModel", "state": {"_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "FloatProgressModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "2.0.0", "_view_name": "ProgressView", "bar_style": "success", "description": "", "description_allow_html": false, "layout": "IPY_MODEL_e7bf71f445ea4711a2d15673bede488f", "max": 79.0, "min": 0.0, "orientation": "horizontal", "style": "IPY_MODEL_94cb436019d04ed9bedda7e2ed984e70", "tabbable": null, "tooltip": null, "value": 79.0}}, "bd19de7d60fc4b6f9e5916b4fdde7cc6": {"model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "FloatProgressModel", "state": {"_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "FloatProgressModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "2.0.0", "_view_name": "ProgressView", "bar_style": "success", "description": "", "description_allow_html": false, "layout": "IPY_MODEL_ff7dd77ccb79403ba246cf28dcacafb8", "max": 2640397119.0, "min": 0.0, "orientation": "horizontal", "style": "IPY_MODEL_d8252727efec4b9696c62c040c8ca468", "tabbable": null, "tooltip": null, "value": 2640397119.0}}, "c1bbd16648f1460ea67a83a4fa3b073b": {"model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": {"_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null}}, "c3ab7d9ae291409c822feadccf3e1a3c": {"model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": {"_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null}}, "c927016679434a06ba6389542d2ebcc0": {"model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLModel", "state": {"_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "2.0.0", "_view_name": "HTMLView", "description": "", "description_allow_html": false, "layout": "IPY_MODEL_c1bbd16648f1460ea67a83a4fa3b073b", "placeholder": "\u200b", "style": "IPY_MODEL_122ba60033424a579185ffd1d17cb947", "tabbable": null, "tooltip": null, "value": "100%"}}, "d0ac3e797cdc42c991acb404ed803fd1": {"model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": {"_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": "2", "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null}}, "d8252727efec4b9696c62c040c8ca468": {"model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "ProgressStyleModel", "state": {"_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "ProgressStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "StyleView", "bar_color": null, "description_width": ""}}, "d904204946034b379397dbfcc09bc690": {"model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "FloatProgressModel", "state": {"_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "FloatProgressModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "2.0.0", "_view_name": "ProgressView", "bar_style": "success", "description": "", "description_allow_html": false, "layout": "IPY_MODEL_f33ca5f52e5948df8a3eea9228711497", "max": 125.0, "min": 0.0, "orientation": "horizontal", "style": "IPY_MODEL_298ff6b37e804133948243fb1eee0bd1", "tabbable": null, "tooltip": null, "value": 125.0}}, "e7bf71f445ea4711a2d15673bede488f": {"model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": {"_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null}}, "ea912d022774434ba7706f16f2bfd94f": {"model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLStyleModel", "state": {"_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "HTMLStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "StyleView", "background": null, "description_width": "", "font_size": null, "text_color": null}}, "ec0cadba87214d2ca07a45b56f499a33": {"model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "FloatProgressModel", "state": {"_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "FloatProgressModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "2.0.0", "_view_name": "ProgressView", "bar_style": "success", "description": "", "description_allow_html": false, "layout": "IPY_MODEL_7128571c8817401caf2bc0413a6c136f", "max": 125.0, "min": 0.0, "orientation": "horizontal", "style": "IPY_MODEL_6f867ae7b37a47b8b9d74187228dafd3", "tabbable": null, "tooltip": null, "value": 125.0}}, "ed85c1662c154086a689c53f8e58bb1e": {"model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLModel", "state": {"_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "2.0.0", "_view_name": "HTMLView", "description": "", "description_allow_html": false, "layout": "IPY_MODEL_3b829a87d73d4e5581b67c3026a9f7bd", "placeholder": "\u200b", "style": "IPY_MODEL_599d959c7ae3456c8d6ccf9c33781148", "tabbable": null, "tooltip": null, "value": "100%"}}, "eec7abe112ae4eeeacd6880e100b1e0d": {"model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HBoxModel", "state": {"_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "HBoxModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "2.0.0", "_view_name": "HBoxView", "box_style": "", "children": ["IPY_MODEL_c927016679434a06ba6389542d2ebcc0", "IPY_MODEL_ec0cadba87214d2ca07a45b56f499a33", "IPY_MODEL_4bd13820b08f43b591a55ad7150043f0"], "layout": "IPY_MODEL_6edc9483a5de46dc92d959f9ee370f6f", "tabbable": null, "tooltip": null}}, "f33ca5f52e5948df8a3eea9228711497": {"model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": {"_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": "2", "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null}}, "f4bdef3d30da43ca85d8c6ae3344d8c4": {"model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLStyleModel", "state": {"_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "HTMLStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "StyleView", "background": null, "description_width": "", "font_size": null, "text_color": null}}, "f6f0c0e738294bb08173e37f9776096a": {"model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "FloatProgressModel", "state": {"_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "FloatProgressModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "2.0.0", "_view_name": "ProgressView", "bar_style": "success", "description": "", "description_allow_html": false, "layout": "IPY_MODEL_d0ac3e797cdc42c991acb404ed803fd1", "max": 78.0, "min": 0.0, "orientation": "horizontal", "style": "IPY_MODEL_afa94de1b2fb403a9ed129ace655eaed", "tabbable": null, "tooltip": null, "value": 78.0}}, "fdf85f79faef4cafa2aa8c94157183f0": {"model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": {"_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null}}, "ff7dd77ccb79403ba246cf28dcacafb8": {"model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": {"_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null}}}, "version_major": 2, "version_minor": 0}}}, "nbformat": 4, "nbformat_minor": 5}