Research Scientist Intern, Speech/Audio Generation
1 week ago
Responsibilities:
Conduct cutting-edge research and development in speech/audio foundation models.
Research, design, develop, and evaluate genera..
Responsibilities:
- Conduct cutting-edge research and development in speech/audio foundation models.
- Research, design, develop, and evaluate generative models.
- Conduct research to integrate Large Language Modeling into speech and audio generation (Neural Tokenizers, Large Language Modeling, etc.).
- Identify key research areas and contribute to innovative speech/audio models.
- Design and curate large-scale, high-quality datasets with multi-team efforts.
Qualifications:
- Major in computer science, mathematics, engineering, a related field, or equivalent professional experience.
- 2+ years of internship or full-time experience in one or more areas of machine learning and deep learning, including but not limited to:Speech/Vocal generation (including text-to-speech, singing-voice-synthesis, prosody transfer, etc.)
Audio generation (including text-to-music, etc.)
Large scale speech/audio self-supervised representation learning and foundation models
Large Language Model pre-training and fine-tuning
- Deep knowledge of deep learning and generative models (Diffusion, Flow Matching, AR Transformer, Mamba, VAE, GAN, etc.).
- Self-driven, innovative, collaborative, with strong communication and presentation skills.
Preferred Qualifications:
- Ability to work collaboratively in fast-paced, startup environment.
- Strong publications in top-tier ML/Speech conferences/journals (e.g., NeurIPS, ICML, ICLR, ICASSP, INTERSPEECH).
- Deep understanding of Large Language Models.
- Familiarity with different variants of diffusion models, such as Flow matching, DDIM, etc.
- Familiarity with large-scale distributed training.
- Knowledge of engineering principles and best practices.
- High proficiency in algorithms and programming; strong coding skills in Python.
Official account of Jobstore.