Job Responsibilities:
- Work with technical and non-technical stakeholders to design and execute research testing how humans evaluate large language model (LLM) features.
- Conduct research with humans to validate and create automated benchmarks for LLM evaluations.
Requirements:
- PhD or advanced Degree in Computer Science, Machine Learning, Cognitive Science, Psychology, Economics, or similar discipline.
- Minimum 2 years of relevant experience in large language model research.
- Fluent in at least one statistical programming language such as Python or R.
- Strong understanding of machine learning principles, especially in the context of LLMs.
- Knowledgeable about LLM evaluation techniques, such as human evaluation and automated benchmarks.
- Able to own and pursue a research agenda, including choosing impactful research problems and autonomously carrying out projects.
- Demonstrated background in collecting data from human participants (e.g. surveys, experiments) with knowledge in data quality, data validity, etc.
- Strong communications skills with the ability to work effectively across internal and external organizations and virtual teams.