I have a Graph Neural Network that operates on directed multigraph where the Data class is from torch_geometric. The data is under this following form:
Data(x=[420, 13], edge_index=[2, 1248], edge_attr=[1248, 2, 718], y=[420], train_mask=[420], test_mask=[420], val_mask=[420])
where both nodes and edges have attributes. I tried to convert the following training script to torch-lightning (taken from github of a paper):
# weighted loss preparation
train_class_ratio = dataset.y[dataset.train_mask].sum().item()/dataset.y[dataset.train_mask].shape[0]
train_class_weights = torch.Tensor([train_class_ratio,1-train_class_ratio]).to(device)
# training loop
start = time.time()
for epoch in range(epochs):
optimizer.zero_grad()
loss = F.nll_loss(model(data)[data.train_mask], data.y[data.train_mask], weight=train_class_weights)
loss.backward()
optimizer.step()
# calculate final accuracy
model.eval()
test_acc = (
model(data).max(dim=1)[1][data.test_mask].eq(data.y[data.test_mask]).sum().item()
/ data.test_mask.sum().item()
)
From my understanding, the whole dataset is fed into the training loop for each epoch (I could be very wrong about this). And since this is graph-structured data, I do not know how to implement a proper DataLoader for this script. So far this is my attempt:
model.py
class Model(pl.LightningModule):
# model implementation
def forward(self, data);
...
def training_step(self, batch, batch_idx):
data, target = batch.x, batch.y
logits = self(data)
loss = F.nll_loss(logits[data.train_mask], target[data.train_mask],
weight=self.params["train_class_weights"])
self.log("train_loss", loss)
return loss
def test_step(self, batch, batch_idx):
data, target = batch.x, batch.y
logits = self(data)
loss = F.nll_loss(logits[data.test_mask], target[data.test_mask])
acc = (logits[data.test_mask].max(dim=1)[1] == target[
data.test_mask]).sum().item() / data.test_mask.sum().item()
self.log('test_loss', loss)
self.log('test_acc', acc)
def configure_optimizers(self):
optimizer = torch.optim.Adam(self.parameters(), lr=self.params["lr"], weight_decay=self.params["weight_decay"])
return optimizer
training_script.ipynb
# load full dataset, define parameters
...
model = Model(dataset, params)
trainer = pl.Trainer(max_epochs=params["epoch"])
trainer.fit(lgcn_model)
trainer.test(lgcn_model)
Could you help me to implement the appropriate DataLoader for this case?
MisconfigurationException:
train_dataloader
must be implemented to be used with the Lightning Trainer