How to train my model using a docker container

I am trying to train a dummy model using lightning.
this is what i have in my docker file.

FROM python:3.8
WORKDIR /app
COPY requirements.txt .
RUN pip install --upgrade pip
RUN pip install -r ./requirements.txt
CMD [ "lightning run app", "src/train.py"]

Also I have LightningWork component in the “train.py” file i mentioned in the dockerfile.
If this is not the way, how can I train the model (not deploy) using docker.

@Prasanth_Noel This looks pretty good already. Is it not working well?
To be clear, you want to containerize your training/application to run locally on your machine, using LightningWork?

To make sure everything works well, you can take one of the examples from our docs and put it in train.py to verify the app starts correctly.

Note that if you want to run the app/work in the Lightning cloud, it’s even easier since you don’t need a special docker image (usually) and it will install your requirements.txt, Python, pip etc. automatically for you.

@awaelchli , We must not always rely on one option. We are thinking of containerizing the application using docker which has always been the robust way, either to train or deploy.
We have already been communicated that we can use requirements.txt and we did. There is no problem training the model. But When we are trying to do the same with docker environment we encounter the following problems.
Training locally: (Yes, I am using LightningWork component in the python script)

  1. I can do this with Docker and Py3.6 and no lighting in requirements.txt
  2. This has been “good enough” up until now
  3. Training locally and training on cluster are inter-related in terms of the environment (i.e., we have to be able to train locally and on cluster using the same environment)
  4. So, now we have to figure out how to include lightining in training environment (even for local training)—which was not the case before.
  5. When I try to add lightning to requirements.txt, I can build environment in Docker but only with Py3.8 (not 3.6).
  6. But, when I try to train the model in environment with lightning and 3.8, I get a run time error implicitly related to using Py3.8.
  7. When I try to build environment with 3.8 in conda (and lightning, etc.) I get errors in building conda environment (wandb, lightning, etc. missing)
  8. Did not explore venv…not sure how to specify python version.
  9. I would rather investigate pipenv than venv (but would prefer Docker or conda).

Training remotely on cluster

  1. I feel I should not pursue this further until we have an environment on which we can train locally which also has lightning in the build
  2. Again, as above
    1. I can train locally with Docker and Py3.6 – which does not have lightning in it
    1. I have to use 3.8 to add lightning to Docker build—which does build successfully
    1. Unfortunately, when I use 3.8 and lightning in env, I get run time error with code that otherwise worked
  1. When I try to build conda env with 3.8, I get errors listed above.