Predictive Visions: Exploring the Potential of Video Predictive Models
Sangmin Lee (University of Illinois Urbana-Champaign, USA)
■ Abstract
Video predictive models have emerged in the field of computer vision, enabling a wide range of exciting applications. In this talk, we explore the potential of these models, which aim to predict future video frames based on previous frames in a sequence. A key advantage of video predictive models is their ability to learn in a self-supervised manner, requiring only raw video data without the need for labeled data. We present three compelling applications that showcase the capabilities of these models. First, we delve into long-term video prediction, which is the direct application of the video predictive models. We here address the challenge of forecasting far into the future given limited input sequences. By introducing an external memory module and two-phase training strategy, we demonstrate the ability to predict the long-term future with improved motions and object shapes. Second, we explore the application of video predictive models in assessing VR sickness, where we utilize the self-supervised properties of them. By training the model on comfort videos and measuring the mismatch between predicted and actual frames, we can effectively quantify the level of motion sickness induced by VR content. Finally, we present a video anomaly detection framework that leverages the self-supervised nature of video predictive models. By learning normal patterns and detecting deviations, we can identify abnormal events in videos without explicit anomaly examples during training. This talk highlights the potential of video predictive models in pushing the boundaries of video understanding and their promising applications in domains such as predictive systems, virtual reality, and video surveillance. By harnessing the power of self-supervised learning, these models pave the way for more robust and efficient video analysis systems.
■ Bio
Sangmin Lee is a Postdoctoral Researcher in the Department of Computer Science at the University of Illinois Urbana-Champaign (UIUC). He received his Ph.D. from KAIST and B.S. from Yonsei University. His research interests lie in expanding machine capabilities through multimodal perception and minimal supervision. He investigates multimodal learning to comprehensively leverage visual, language, audio, and physiological signals for holistic reasoning. Furthermore, he explores self-supervised learning to effectively derive feature representations even from weakly-labeled or unlabeled data. Building upon these foundations, his current research focuses on developing socially intelligent machines that can understand and interact with humans in social contexts seamlessly. His research has been recognized with the Samsung HumanTech paper award and ICIP best paper finalist. He won 1st place in ad-hoc video search at the VBS international competition. He has published a number of research papers in top-tier AI conferences and journals.