Computer Vision; Multi-modality; Generative Models; Structure from Motion; Multi-view Stereo; Localization and Mapping; Argument Reality; Virtual Reality.
Generate realistic talking video from an image and audio