Generative Model with Multi-modal Data

A generative model is a model that can generate unseen images by learning data distribution. It has been widely researched in the computer vision community due to its usefulness in many applications. The generative model can take an input image as a condition and modify the image based on the condition. Similarly, one can generate novel artworks given existing artwork and specific modalities such as text messages or sound. Another application is colorization. For example, the input image can be a grayscale image and the model outputs the colorized image.

3D Computer Vision with Deep Learning

Technical advancement of 3D printing, virtual reality, and augmented reality has greatly increased the interest in handling three-dimensional shapes such as three-dimensional object synthesis and reconstruction, which has been deeply studied in computer vision communities. The emergence of neural networks and the creation of large-scale three-dimensional object datasets inspired researchers to rediscover three-dimensional object representation learning and synthesis.

Multi-modal Representation Learning

Video contains not only visual information but also sound information. This multi-modal information helps understanding content, and it is more descriptive than visual information. Fusing this multi-modal information is an open question, and it is the first step toward general artificial intelligence. Additionally, there are a lot of different kinds of visual sensors which provide complementary information. Fusing this heterogeneous information reduces the uncertainty of the estimation model. In our lab, we research the integration of multiple sensor data with deep learning models.

Large Scale Dataset Curation

Large-scale dataset curation and developing efficient annotation methods are crucial for deep learning since deep learning requires a large-scale dataset to optimize the deep neural networks. For this reason, the performance of the model has a positive correlation with the size and quality of the dataset.

Dynamic Vision Sensor

Dynamic vision sensor is the next generation of vision camera, which mimics human eyes to visualize motions. Unlike a conventional camera, it locates individual pixel locations in microseconds as event data. Therefore, it has low latency and low power consumption, and it is also robust from motion blur, unlike conventional cameras. With these advantages, it has huge potential usages in AR/VR applications and autonomous driving.

Machine Perception

Machine perception has been widely researched in the computer vision community due to its importance in many real-life applications. A recent breakthrough in machine perception by introducing deep learning significantly increased the usage of visual machine perception. In particular, hand/body pose estimation, object detection, object recognition, and pixel-wise segmentation have been opened the door to many real-life applications.

Novel & Future View Synthesis

In computer vision, view synthesis has been used to apply changes in lighting and viewpoint to single-view images of rigid and non-rigid objects. In real-life applications, synthetic views can be used to help predict unobserved part locations and also improve the performance of object grasping with manipulators and the path planning of an autonomous driving system.

Computer Vision Lab
Department of Artificial Intelligence, Korea University
603, Woojung Hall of Informatics, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul