Responsibilities:
• Technical Leadership: Ability to recommend and advocate for MLOps design patterns, best practices, and tooling in an enterprise setting. Ability to lead the implementation of the same.
• Technical Debt Resolution: Address and resolve technical debt in current ML projects in production and incorporate best MLOps practices.
• Model Deployment: Deploy machine learning models and enhance automation in the deployment process.
• Automation and Checks: Implement and manage automation pipeline, set up necessary tests for continuous integration and deployment.
• Monitoring and Maintenance: Monitor data drift, model performance and other metrics in production and work with Data Scientists to retrain models and set up retraining pipelines.
• Security & Compliance: Ensure that ML systems comply with security standards and best practices in a cloud environment.
• Collaboration: Work closely with:
o Data Engineers and data modellers to understand the data pipelines and data models.
o Data scientists to understand experimental ML models and ensure that models are integrated and operationalized effectively.
o Software engineers/IT to deploy infrastructure.
Required Skillset:
• 7-10 years of overall experience in software engineering, data engineering, or MLOps preferably with enterprise-level, complex matrix organizations.
• Experience in setting up MLOps pipelines, systems and processes from scratch.
• Proven experience with AWS (Athena, Glue, ECS, EKS, VPC, etc.) and AWS SageMaker specifically for deploying machine learning models, enhancing automation and implementing necessary checks for continuous improvements.
• Develop and manage CI/CD pipelines (Azure Pipelines preferred) to automate model deployment, testing, and integration processes.
• Orchestration and monitoring of data pipelines and ML workflows, ensuring timely execution and monitoring (Apache Airflow preferred).
• Strong experience with Python and Bash for automating ML workflows, SQL and Pyspark for feature engineering.
• Familiarity with IaC tools such as Terraform or AWS CloudFormation for managing cloud infrastructure.
• Knowledge of security practices and compliance requirements for managing data and models in the cloud.
• Knowledge of Data Science/Machine Learning lifecycle and frameworks such as scikit-learn, Pytorch, Tensorflow.
Good to Have:
• Experience with Azure Synapse and Azure ML Studio
• Experience with Databricks
• Experience with dbt