Job Responsibilities:
- Design, develop, and oversee the implementation of processes and tools to assess the quality of data used in large language model (LLM) model training.
- Create metrics and KPIs to evaluate data accuracy, consistency, and relevance.
- Work with Engineering teams to develop automated logic checks that will identify inconsistencies and potential issues in the training data.
- Lead the integration of quality processes into existing data pipelines.
- Collaborate with Data Scientists to scrutinize annotation data and develop strategies for continuous data quality improvement.
- Provide feedback loops and ensure alignment of data quality with annotation guidelines.
- Engage with Machine Learning Engineers to determine how data quality variations influence LLM model performance.
- Recommend adjustments to data collection, preprocessing, and utilization based on model performance analysis.
- Keep abreast of the latest trends and advancements in data quality management.
- Recommend and implement enhancements to our quality processes, tools, and methodologies based on industry best practices.
Requirements:
- 7+ years of design/test/implementation/consulting experience in data quality management for machine learning model training.
- Strong understanding of machine learning principles, especially in the context of NLP and LLMs.
- Fundamental knowledge in relevant programming languages and tools (e.g., Python, SQL).
- Demonstrated experience in project management and cross-functional collaboration.
- Exceptional analytical, problem-solving, and organizational skills.
- Proven ability to think strategically about business, product, and technical challenges.
- Strong verbal and written communication skills with the ability to work effectively across internal and external organizations and virtual teams.