Loading Partitioned Parquet files using Petastorm

I have my training and validation datasets stored as parquet files (output of pyspark-based data prep pipeline) - I am using Petastorm to load it currently like so -

class ItemsageDataModule(pl.LightningDataModule):
    def __init__(self, **kwargs):
        super().__init__()
        self.train_path = kwargs['train_path']
        self.val_path = kwargs['val_path']
        self.batch_size = kwargs['bsz']

    def setup(self, stage=None):
        pass
        
    def train_dataloader(self):
        self.reader_train = make_reader(self.train_path, num_epochs=1, seed=1, shuffle_rows=True)
        return DataLoader(self.reader_train, batch_size=self.batch_size)

    def val_dataloader(self):
        self.reader_val = make_reader(self.val_path, num_epochs=1, seed=1, shuffle_rows=False)
        return DataLoader(self.reader_val, batch_size=self.batch_size)

I read in a related post that if i am using Petastorm, I might need to load the datasets every epoch, this is how my trainer looks like -

trainer = Trainer(
    callbacks= callbacks ,
    max_epochs=model_args['n_epochs'],
    max_steps = 1000,
    # num_sanity_val_steps = 0,
    accelerator="gpu", 
    devices=4,
    # num_nodes = -1,
    strategy="deepspeed",
    deterministic=True,
    # precision='16-mixed',
    default_root_dir=log_dir,
    reload_dataloaders_every_n_epochs=1,
    benchmark = True,
    # use_distributed_sampler=True,
    enable_progress_bar=True,
    enable_model_summary=True,
    check_val_every_n_epoch=1,
    # precision='32-mixed',
    logger=logger)

Is this the right way to go about it or is there something more efficient?

For context, my training set has ~13B examples and val set has 5% of that population.