Responsibilities:
- Design and implement large-scale dataset construction strategies to ensure data quality and diversity, meeting the training needs of various AI models.
- Lead the big data and web scraping team, overseeing data collection, cleaning, annotation, and preprocessing workflows.
- Utilize deep learning and machine learning techniques to optimize the model training process, including model architecture design, hyperparameter tuning, and performance evaluation.
- Collaborate with cross-functional teams, including engineering, product, and business teams, to identify and prioritize dataset and model requirements.
- Monitor performance metrics during the model training process to ensure model accuracy and efficiency.
- Promote the adoption of the latest data science methods and technologies within the team, ensuring skills remain up-to-date.
Requirements:
- Bachelor's degree or higher in Computer Science, Data Science or Information Technology.
- A minimum of 2 years of experience in data science, machine learning, or a related domain, with at least 2 years focusing on large-scale model training.
- Strong programming skills in Python or R, and familiarity with deep learning frameworks such as TensorFlow and PyTorch.
- Proven track record in leading and managing data science projects, with the capability to mentor and guide team members on complex tasks.
- In-depth knowledge of dataset construction, including the best practices for data scraping, cleaning, and preprocessing.
- Practical project experience in natural language processing (NLP), computer vision (CV), or recommendation systems.
- Experience in publishing papers in relevant fields or presenting at reputable conferences, along with expertise in model training within large-scale distributed computing environments.