ModaVerse: Efficiently Transforming Modalities with LLMs
Paper
• 2401.06395 • Published
• 3
Note Multimodal LLM, Using Agent binding to Generation Model for generation
Note Approach A didn't work well so the authors conclude that the speech tokens cannot be treated as a new language
Note Using HuBERT speech token directly in the LLM. Train a GAN vocoder (HiFi-GAN) for decoding.