Computing model output over dataset

Would this work? If you gather all your predicted probabilities and targets in your validation step, you could override the epoch end method for validation to aggregate them and save them to file.