Data Fusion of Multi-modal Data

Video contains not only visual information but also sound information. These multi-modal information helps understanding contents, and it is more descriptive than visual information. Fusing these multi-modal information is an open question, and it is the first step toward general artificial intelligence. Additionally, there are a lot of different kinds of visual sensors which provide complementary information. Fusing these heterogeneous information reduces the uncertainty of the estimation model. In our lab, we research on integration of multiple sensor data with deep learning models.

Generative Model with Multi-modal Data

Generative model is a model that can generate unseen images by learning data distribution. It has been widely researched in computer vision community due to their usefulness in many applications. The generative model can take an input image as condition and modifies the image based on the condition. Similarly, one can generate novel art works given existing art work and specific modality such as text message or sound. Another application is colorization. For example, the input image can be gray scale image and the model outputs the colorized image.

3D Computer Vision with Deep Learning

Technical advancement of 3D printing, virtual reality, and augmented reality has greatly increased the interest of handling three-dimensional shapes such as three-dimensional object synthesis and reconstruction, which has been deeply studied in computer vision communities. Emergence of neural networks and creation of large-scale three-dimensional object datasets inspired researchers to rediscover three-dimensional object representation learning and synthesis.

Dynamic Vision Sensor

Dynamic vision sensor is the next generation of vision camera, which mimics human eyes to visualize motions. Unlike conventional camera, it locates individual pixel location in microsecond as an event data. Therefore, it has low latency and low power consumption, and it is also robust from motion blur unlike conventional camera. With these advantages, it has huge potential usages in AR/VR applications and autonomous driving.

Machine Perception

Machine perception has been widely researched in computer vision community due to their importance in many real-life applications. Recent breakthrough in machine perception by introducing deep learning significantly increased the usage of visual machine perception. In particular, hand/body pose estimation, object detection, object recognition, and pixel-wise segmentation have been opened the door to many real-life applications.

Novel & Future View Synthesis

In computer vision, view synthesis has been used to apply changes in lighting and viewpoint to single-view images of rigid and non-rigid objects. In real-life applications, synthetic views can be used to help predict unobserved part locations and also improve the performance of object grasping with manipulators and the path planning of an autonomous driving system.

Large Scale Dataset Curation

Large scale dataset curation and developing efficient annotation methods are crucial for deep learning since deep learning requires a large scale dataset to optimize the deep neural networks. In this reason, the performance of the model has positive correlation with the size and quality of the dataset.