Regression based model for videos

There is a consumption score(like if a video continued more than 3 secs, that is consumed, and if out of 100, 50 people consumed so the Consumption score is 0.5).
Like if we scroll YouTube videos, they start playing, so if video continued more than 3 secs, it is consumed.
So I want to predict these regressional scores based on video frames, and also if possible by processing the given description of videos.
Thanks.