Training on TPU

Hi all, I created this blog (and Colab) using Lightning, but I could not get it to work on TPUs by just doing tpu_cores=8. It works perfectly fine on GPUs.

Anything you can suggest on how to convert it to TPUs?

I thought it was the fact that I used the tokenizer in the training_step which could be where I went wrong, but it’s probably not what’s going wrong.

def common_step(self, batch: Tuple[torch.Tensor, List[str]]) -> torch.Tensor:
        images, text = batch
        device = images.device
        text_dev = {k: v.to(device) for k, v in self.tokenizer(text).items()}

^I’m using image.device rather than specifying which device since I won’t know which of the 8 devices I’m on in the TPU.

I tried moving the tokenizing step inside my DataSet class without much luck ether, unless I did that wrong.

Any thought on how to debug it as well? TPU mode doesn’t give any understandable stack trace for me to follow.