Pseudo Dataset Generation for Out-of-domain Multi-Camera View Recommendation

Paper Project

Abstract

Multi-camera systems are indispensable in movies, TV shows, and other media. Selecting the appropriate camera at every timestamp has a decisive impact on production quality and audience preferences. Learning-based view recommendation frameworks can assist professionals in decision-making. However, they often struggle outside of their training domains. The scarcity of labeled multi-camera view recommendation datasets exacerbates the issue. Based on the insight that many videos are edited from the original multi-camera videos, we propose transforming regular videos into pseudo-labeled multi-camera view recommendation datasets. Promisingly, by training the model on pseudo-labeled datasets stemming from videos in the target domain, we achieve a 68% relative improvement in the model's accuracy in the target domain and bridge the accuracy gap between in-domain and never-before-seen domains.


Multi-camera recommendation model suffers out-of-domain.



Multi-camera editing model performs poorly out-of-domain.
a) A multi-camera recommendation model trained on a labeled multi-camera editing dataset of a particular domain generalizes poorly to a never-before-seen domain and the accuracy drops significantly. (b) Our proposed method leverages regular videos to generate pseudo-labeled datasets for the target domain and improve themodel’s accuracy.

Lack of labeled multi-camera editing dataset in different domains.

Few labeled multi-camera editing data

Intuition -- Videos are edited from original multi-camera videos.



Pseudo dataset generation intuition

Generate pseudo-labeled data from edited videos!



Pseudo dataset generation pipeline

Pseudo datasets bridge the domain gap.



Comparison without pseudo dataset
Multiple factors could compound and intensify the domain gap. (Left) Without any labels and simply leveraging regular videos, the proposed pseudo-labeled dataset outperforms the professional labeled dataset that is on a different domain. (Right) Training the model on the pseudo dataset created from videos of the same domain (video scene and type) achieves the best performance.

Citation

@inproceedings{lee2024_multicam_recom,
  author = {Lee, Kuan-Ying and Zhou, Qian and Nahrstedt, Klara},
  title = {Pseudo Dataset Generation for Out-of-domain Multi-Camera View Recommendation},
  booktitle = {IEEE visual communications and image processing (VCIP)},
  year = {2024},
}

Acknowledgement -- website template adopted from Jon Barron