Job Responsibilities:
* Responsible for the development of multimodal base models that can handle multimodal data such as image, text, speech, etc., responsible for the overall design and optimization of the network structure framework;
* Responsible for exploring the latest research results and technological advances in the field of multimodal, familiar with multimodal models such as BLIP, LLaVA, mini-GPT4;
* Construct and maintain multimodal datasets, and be responsible for the business implementation and promotion of multimodal large models;
Tenure Requirements:
* Master degree or above in Computer Science, Artificial Intelligence, Machine Learning or related fields.
* Familiar with Python/C++ programming, master PyTorch and other frameworks.
* Familiar with the processing and representation of multimodal data, such as the fusion of images with text, speech and other data modalities.
* Familiar with training of familiar multimodal models, familiar with classical model structures such as blip2
Additional plus points, fulfill one or more
* Strong research ability, published papers in CCF B or above conferences or journals.
* Strong competition results, having won any award from ACM, NOI, NOIP or other commercial code competitions.
* Strong experience in academic competitions or top rankings on Leaderboard for well-known datasets.
* Strong coding skills and experience with high quality medium to large scale projects or personal open source projects.
* Strong research spirit with deep exploration and understanding of selected languages, systems, and algorithms.
* Excellent academic performance with a high GPA.
* Self-driven and hardworking.