Job Responsibilities:
- Responsible for developing multimodal base models that can handle multimodal data such as image, text, speech, etc. Responsible for the overall design and optimization of the network structure framework;
- Responsible for exploring the latest research results and technological advances in the multimodal field, familiar with multimodal models such as BLIP, LLaVA, mini-GPT4;
- Construct and maintain multimodal datasets, and be responsible for the business implementation and promotion of multimodal large models;
Requirements:
- Master degree or above in Computer Science, Artificial Intelligence, Machine Learning or related fields.
- Familiar with Python/C++ programming, master PyTorch and other frameworks.
- Familiar with the processing and representation of multimodal data, such as the fusion of images with text, speech and other data modalities
- Familiar with training of multimodal models, familiar with classical model structures such as blip2
Additional plus points, fulfill one or more:
- Strong research ability, published papers in CCF B or above conferences or journals.
- Strong competition results, having won any award from ACM, NOI, NOIP or other commercial code competitions.
- Strong experience in academic competitions or top rankings on Leaderboard for well-known datasets.
- Strong coding skills and experience with high quality medium to large scale projects or personal open source projects.
- Strong research spirit with deep exploration and understanding of selected languages, systems, and algorithms.
- Excellent academic performance with a high GPA.
- Self-driven and hardworking.