Machine Learning Engineer
1 week ago
We are hiring Machine Learning Engineer for a technology client on a yearly renewable contract role. The team is building cutting-edge tools and infra..
We are hiring Machine Learning Engineer for a technology client on a yearly renewable contract role. The team is building cutting-edge tools and infrastructure to drive innovation and automation throughout the organisation. In this role you will contribute to the creation of new compute layer using Ray and you will help drive the set-up of the infrastructure for Ray on Kubernetes and its integration with the existing Data system
What you’ll do :
- Deliver high-quality AI infrastructure solutions: You will work with the Machine Leaning Platform team to design and develop the infrastructure to support Ray for distributed data processing and model training. You will develop using GitOps to ensure the reproducibility of the system's cloud infrastructure on different Kubernetes clusters.
- Develop observability solutions for Ray: You will be responsible for developing and integrating monitoring and alerting within the client’smonitoring stack powered by Datadog, Prometheus and Grafana. You will also contribute to the creation of runbooks and DevOps guides.
- Support the data science community in adopting Ray: You will work with our product team to socialise Ray's use. You will be responsible for supporting users in running their job on the Ray clusters.
What you’ll need:
- In-depth knowledge of ML-OPS with a solid understanding of distributed computing for data processing. Knowledge of Ray is preferable, but other frameworks, such as Dask, Modin, Beam, Horovod, and Deepspeed, are also valued.
- An understanding of, and ideally experience with, image generation models such as stable diffusion and flux.
- An understanding of ML inference using LLM and other generative models.
- Good knowledge of Python and ML ecosystems.
- Strong understanding of developing and deploying systems on Kubernetes.
- Previous experience with GitOps solutions like ArgoCD is preferred. Good knowledge of Helm and Kustomise is also valued.
- Good DevOps background, with Infrastructure as Code (IaC) such as code with Terraform preferred
- At least 3 years of relevant Machine Engineering experience
If this role sounds like an ideal job move, please hit the apply button with your latest resume alternatively you can email me the resume at [email protected]
Regret only shortlisted candidates will be notified.
CEI No: R1659595 / EA No: 07C3147
Official account of Jobstore.