Over a billion of new images and videos are uploaded to the Internet every single day. Search engines allow combing through this sea of data and enable accessing unstructured collections of casual photos and videos. Much of the visual media is collected by consumer cameras. Today, people are inseparable from their smartphones, bringing them to work, parties, family gatherings, vacations, etc. The purpose of smartphones has long extended beyond communication to visually documenting our lives. Recently, wearable cameras, such as the popular GoPro, are becoming increasingly common. Together with smartphone cameras they form a huge collection of ``social cameras'', with which people capture almost all imaginable activity. Although many of these activities are rather casual and of low general interest, there are also many more unique and interesting activities, many of which are captured simultaneously by multiple devices. Social cameras form a new type of media that have unique properties. They are unstructured, in the sense, that generally speaking, we don't know their exact locations, or their external or internal parameters. The cameras network may consist of various type of cameras. Large portions of their footage is boring, repetitive and in general of low interest. The data is unsynchronized and the quality of the imaging is often rather low. In our research, we will deal with some of these issues, which are characteristic to social cameras.
The high-level goals of this research proposed herein are to develop novel applications that leverage the availability of the large amounts of visual data present in photos and video streams captured by today's abundant still and video cameras, including social cameras, and instrumented cameras (fixed location webcams and surveillance cameras). More specifically, in this proposal, we plan to focus our research activity on three areas: (1) extracting elements from hand-held or wearable video cameras, as well as the compositing of elements thus extracted over novel backgrounds, which themselves originate from mobile video cameras ; (2) analyzing spatio-temporal events captured by multiple social cameras. (3) tackling the challenging problem of fusing multiple video streams with 3D models, in a manner that provides novel and intuitive ways of browsing such video streams.
This project aimed to develop new algorithm and application with the help of data from static and dymatic camera. Specifically, the project focuses on hand-held image processing, micro video understanding, proactive 3D analysis in videos, especially on the characteritics of data from social camera. This project has achieved a series of innovative research results, published 11 papers on top international journals and conferences such as ACM TOG、IEEE TVCG、SIGGRAPH、SIGIR、CVPR, applied and authorized 10 patents. Meanwhile, the project has supervised 2 doctoral students and 10 master students, and 1 postdoc, as well as has held 5 academic conferences.
1.Huayong Xu, Yangyan Li, Wenzheng Chen, Dani Lischinski, Daniel Cohen-Or, Baoquan Chen. A Holistic Approach for Data-driven Object Cutout. ACCV, 2016.
2.Qingnan Fan, Jiaolong Yang, Gang Hua, Baoquan Chen, David Wipf. Revisiting Deep Intrinsic Image Decomposition. CVPR 2018.
3.Meng Liu, Xiang Wang, Liqiang Nie, Xiangnan He, Baoquan Chen, Tat-Seng Chua. Towards Micro-Video Understanding by Joint Sequential-Sparse Modeling. ACM MM 2017.
4.Meng Liu, Xiang Wang, Liqiang Nie, Qi Tian, Baoquan Chen, Tat-Seng Chua. Cross-modal Moment Localization in Videos. ACM MM 2018.
5.Meng Liu, Xiang Wang, Liqiang Nie, Xiangnan He, Baoquan Chen, Tat-Seng Chua. Attentive Moment Retrieval in Videos. ACM SIGIR 2018.
6.Bin Wang, Guofeng Wang, Andrei Sharf, Yangyan Li, Fan Zhong, Xueying Qin, Daniel Cohen-Or, Baoquan Chen. Active Assembly Guidance with Online Video Parsing. IEEE VR 2018.