Hey all, as the title says, I am trying to use SWA with cyclic scheduling/annealing strategy. I was thrilled to find that Lightning implemented SWA. I understand it is a relatively new lightning feature and still expiremental so its documentation isn’t as complete, so I might’ve missed something.

Basically, I am trying to do this, mentioned in PyTorch blog where a cyclic annealing strategy is used, instead of a linear or a cosine one. Then, the weights at the end of each cycle are averaged to get the final model. I’ve found a similar implementation in keras but none in here. Any help would be appreciated!

Image for reference from the above blog: