Key responsibilities:
- Conduct reviews for compliance of the ML models in accordance with overall platform governance principles such as versioning, data / model lineage, code best practices and provide feedback to data scientists for potential improvements
- Develop pipelines for continuous operation, feedback and monitoring of ML models leveraging best practices from the CI/CD vertical within the MLOps domain. This can include monitoring for data drift, triggering model retraining and setting up rollbacks.
- Optimize AI development environments (development, testing, production) for usability, reliability and performance.
- Have a strong relationship with the infrastructure and application development team in order to understand the best method of integrating the ML model into enterprise applications (e.g., transforming resulting models into APIs).
- Work with data engineers to ensure data storage (data warehouses or data lakes) and data pipelines feeding these repositories and the ML feature or data stores are working as intended.
- Evaluate open-source and AI/ML platforms and tools for feasibility of usage and integration from an infrastructure perspective. This also involves staying updated about the newest developments, patches and upgrades to the ML platforms in use by the data science teams.
Key requirements:
- Proficiency in Python used both for ML and automation tasks
- Good knowledge of Bash and Unix/Linux command-line toolkit is a must-have.
- Hands on experience building CI/CD pipelines orchestration by Jenkins, GitLab CI, GitHub Actions or similar tools is a must-have.
- Knowledge of OpenShift / Kubernetes is a must-have.
- Good understanding of ML libraries such as Panda, NumPy, H2O, or TensorFlow.
- Knowledge in the operationalization of Data Science projects (MLOps) using at least one of the popular frameworks or platforms (e.g., Kubeflow, AWS Sagemaker, Google AI Platform, Azure Machine Learning, DataRobot, Dataiku, H2O, or DKube).
- Knowledge of Distributed Data Processing framework, such as Spark, or Dask.
- Knowledge of Workflow Orchestrator, such as Airflow or Ctrl-M.
- Knowledge of Logging and Monitoring tools, such as Splunk and Geneos.
- Experience in defining the processes, standards, frameworks, prototypes and toolsets in support of AI and ML development, monitoring, testing and operationalization.
- Experience in ML operationalization and orchestration (MLOps) tools, techniques and platforms. This includes scaling delivery of models, managing and governing ML Models, and managing and scaling AI platforms.
- Knowledge of cloud platforms (e.g. AWS, GCP) would be an advantage.