As a Data Scientist in DPPCC, you will be breaking new ground in the emerging field of privacy engineering through working on impactful and scalable technologies, that help unlock the potential of data for public good.
PETs are not new, but they have been challenging to implement in real use cases, whether within or outside of government. The team aims to bridge the gap between research to adoption through experimentation and piloting with agencies to assess the feasibility and utility of each technology, and prepare them for wider adoption in government, whether it be through building products, evaluating current solutions, developing clear guidelines around usage or education.
The team’s portfolio is currently working on evolving technologies includes such as LLM-powered approaches to PII detection and redaction Free-Text Anonymization, Synthetic Data Generation, Homomorphic Encryption and Differential Privacy, and we plan to open new projects in other technologies (e.g., Federated Learning, Confidential Computing, Data Clean Rooms).
We are seeking a Data Scientist to:
(a) gather information on existing solutions for a PET,
(b) run experiments to benchmark and evaluate potential optimal solutions,
(c) design implementable workflows/ solutions that agencies can use, and
(d) and synthesize research on advancements, case studies, and benchmarks to develop materials that guide agency adoption and meet their needs. synthesize research on international case studies/ benchmarks and/or developing materials to guide agency adoption. and
(e) develop and deploy cloud-based implementations of optimal solutions.
Requirements:
- Recently completed a Bachelor’s degree or diploma in Computer Science, Data Science or a relevant field of study
- Has sufficient foundation in AI/ML model development to be able to run experiments according to user needs
- Experience with quantitative analysis, including machine learning frameworks and tools (e.g., scikit-learn, TensorFlow, PyTorch)
- Proficiency in at least one programming languages, such as Python or R
- Experience with the cloud (e.g. AWS, GCP, Azure)
- We value candidates with T-shaped competencies. Adjacent expertise in related areas like software development (e.g., cloud infrastructure, frontend technologies, backend with REST APIs) is a big plus.
- Has an inclination to read and synthesize research work, even in a new domain.
- Curiosity, willingness to learn and explore new domains, especially in data privacy and protection. While experience with privacy-enhancing technologies is a plus, we are happy for you to learn on the job.
- Inclination to work in a collaborative environment
While this role is primarily for a data scientist, we would highly value a candidate with software engineering competencies.
- Machine learning frameworks and tools (scikit-learn, TensorFlow, PyTorch as examples)
- Proficiency in at least one programming language (e.g., Python, R)
- Experience with cloud infrastructure (e.g., AWS, GCP, Azure)
- Optional but highly valued: Frontend technologies, backend with REST APIs