Cannot Access Checkpoint file while using trainer.fit

Hi, I am currently working on a binary image classification model and I kept running into this problem every 2 epochs. Can anyone help me with this, please?
Here is the error:
PermissionError Traceback (most recent call last)
Cell In [18], line 1
----> 1 trainer.fit(model=model, train_dataloaders=train_loader)

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pytorch_lightning\trainer\trainer.py:696, in Trainer.fit(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
677 r"“”
678 Runs the full optimization routine.
679
(…)
693 datamodule: An instance of :class:~pytorch_lightning.core.datamodule.LightningDataModule.
694 “”"
695 self.strategy.model = model
→ 696 self._call_and_handle_interrupt(
697 self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
698 )

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pytorch_lightning\trainer\trainer.py:650, in Trainer._call_and_handle_interrupt(self, trainer_fn, *args, **kwargs)
648 return self.strategy.launcher.launch(trainer_fn, *args, trainer=self, **kwargs)
649 else:
→ 650 return trainer_fn(*args, **kwargs)
651 # TODO(awaelchli): Unify both exceptions below, where KeyboardError doesn’t re-raise
652 except KeyboardInterrupt as exception:

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pytorch_lightning\trainer\trainer.py:735, in Trainer._fit_impl(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)

167 shutil.rmtree(p)
168 else:
→ 169 os.remove(p)

PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: ‘c:/Users/elmow/OneDrive/Documents/Projects/Detect1/.neptune/Untitled/DET-7/checkpoints/epoch=0-step=7.ckpt’

hey @Elmo_Aphiwetsaa, it seems like your training process doesn’t have permission to access the model checkpoint file path.

Could you try saving it to a different folder? Maybe a folder other than OneDrive?

I tried another folder other than OneDrive and I still faced the same problem. Here is my error:
PermissionError Traceback (most recent call last)
Cell In [18], line 1
----> 1 trainer.fit(model=model, train_dataloaders=train_loader)

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pytorch_lightning\trainer\trainer.py:696, in Trainer.fit(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
677 r"“”
678 Runs the full optimization routine.
679
(…)
693 datamodule: An instance of :class:~pytorch_lightning.core.datamodule.LightningDataModule.
694 “”"
695 self.strategy.model = model
→ 696 self._call_and_handle_interrupt(
697 self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
698 )

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pytorch_lightning\trainer\trainer.py:650, in Trainer._call_and_handle_interrupt(self, trainer_fn, *args, **kwargs)
648 return self.strategy.launcher.launch(trainer_fn, *args, trainer=self, **kwargs)
649 else:
→ 650 return trainer_fn(*args, **kwargs)
651 # TODO(awaelchli): Unify both exceptions below, where KeyboardError doesn’t re-raise
652 except KeyboardInterrupt as exception:

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pytorch_lightning\trainer\trainer.py:735, in Trainer._fit_impl(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)

167 shutil.rmtree(p)
168 else:
→ 169 os.remove(p)

PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: ‘c:/Users/elmow/Documents/Detect1/.neptune/Untitled/DET-8/checkpoints/epoch=0-step=7.ckpt’

I tried another folder other than OneDrive and I still faced the same problem. Here is my error:
PermissionError Traceback (most recent call last)
Cell In [18], line 1
----> 1 trainer.fit(model=model, train_dataloaders=train_loader)

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pytorch_lightning\trainer\trainer.py:696, in Trainer.fit(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
677 r"“”
678 Runs the full optimization routine.
679
(…)
693 datamodule: An instance of :class:~pytorch_lightning.core.datamodule.LightningDataModule.
694 “”"
695 self.strategy.model = model
→ 696 self._call_and_handle_interrupt(
697 self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
698 )

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pytorch_lightning\trainer\trainer.py:650, in Trainer._call_and_handle_interrupt(self, trainer_fn, *args, **kwargs)
648 return self.strategy.launcher.launch(trainer_fn, *args, trainer=self, **kwargs)
649 else:
→ 650 return trainer_fn(*args, **kwargs)
651 # TODO(awaelchli): Unify both exceptions below, where KeyboardError doesn’t re-raise
652 except KeyboardInterrupt as exception:

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pytorch_lightning\trainer\trainer.py:735, in Trainer._fit_impl(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)

167 shutil.rmtree(p)
168 else:
→ 169 os.remove(p)

PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: ‘c:/Users/elmow/Documents/Detect1/.neptune/Untitled/DET-8/checkpoints/epoch=0-step=7.ckpt’

I tried changing my folder from OneDrive and I still faced the same problem. Here the current error:
PermissionError Traceback (most recent call last)
Cell In [18], line 1
----> 1 trainer.fit(model=model, train_dataloaders=train_loader)

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pytorch_lightning\trainer\trainer.py:696, in Trainer.fit(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
677 r"“”
678 Runs the full optimization routine.
679
(…)
693 datamodule: An instance of :class:~pytorch_lightning.core.datamodule.LightningDataModule.
694 “”"
695 self.strategy.model = model
→ 696 self._call_and_handle_interrupt(
697 self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
698 )

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pytorch_lightning\trainer\trainer.py:650, in Trainer._call_and_handle_interrupt(self, trainer_fn, *args, **kwargs)
648 return self.strategy.launcher.launch(trainer_fn, *args, trainer=self, **kwargs)
649 else:
→ 650 return trainer_fn(*args, **kwargs)
651 # TODO(awaelchli): Unify both exceptions below, where KeyboardError doesn’t re-raise
652 except KeyboardInterrupt as exception:

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pytorch_lightning\trainer\trainer.py:735, in Trainer._fit_impl(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)

167 shutil.rmtree(p)
168 else:
→ 169 os.remove(p)

PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: ‘c:/Users/elmow/Documents/Detect1/.neptune/Untitled/DET-8/checkpoints/epoch=0-step=7.ckpt’

Could you please try to check if the Python process has write access to your filepath using os.access

I tried print(os.access(‘c:/Users/elmow/Documents/Detect2/.neptune/Untitled/DET-8/checkpoints/epoch=0-step=7.ckpt/path/to/folder’, os.W_OK))
print(os.access(‘c:/Users/elmow/Documents/Detect2/.neptune/Untitled/DET-8/checkpoints/epoch=0-step=7.ckpt/path/to/folder’, os.R_OK))
and both of them return false.

This issue is because you don’t have access to the file path where you are saving your model. You might have to check your path permission.

So it is because the user account I used doesn’t have path permission? I actually don’t understand how the permission works, can you please explain the basics? Like is the python.exe needs access to the folder or like the python has access as much as the user account?