Convolutional neural networks (CNNs) can model complicated non-linear relations between images. However, they are notoriously sensitive to small changes in the input. Most CNNs trained to describe image-to-image mappings generate temporally unstable results when applied to video sequences, leading to flickering artifacts and other inconsistencies over time. In order to use CNNs for video material, previous methods have relied on estimating dense frame-to-frame motion information (optical flow) in the training and/or the inference phase, or by exploring recurrent learning structures. We take a different approach to the problem, posing temporal stability as a regularization of the cost function. The regularization is formulated to account for different types of motion that can occur between frames, so that temporally stable CNNs can be trained without the need for video material or expensive motion estimation. The training can be performed as a fine-tuning operation, without architectural modifications of the CNN. Our evaluation shows that the training strategy leads to large improvements in temporal smoothness. Moreover, for small datasets the regularization can help in boosting the generalization performance to a much larger extent than what is possible with naive augmentation strategies.



Download Size Description
Paper 5.6Mb Paper.
Supplementary document 0.2Mb Supporting document, with some details aswell as an implementation example.
Video 105Mb The video overview presented above.



Gabriel Eilertsen, Rafał K. Mantiuk, Jonas Unger. Single-frame Regularization for Temporally Stable CNNs. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.


  author       = "Eilertsen, Gabriel and 
                  Mantiuk, Rafa\l and 
                  Unger, Jonas",
  title        = "Single-frame Regularization for Temporally Stable CNNs",
  booktitle    = "The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)",
  month        = "June",
  year         = "2019"