Job Description Summary
As part of the AI Platform, within Data as a Service (DaaS), our passion is to advance AI capabilities and deliver AI-powered solutions to optimize processes, maximize productivity and increase value for our customers and colleagues. We are seeking a highly skilled and experienced Senior AI/ML Cloud Engineer to join our team and drive the delivery of cutting-edge enterprise capabilities and solutions on the Azure AI Platform. In this role, you will be at the forefront of designing, implementing, and maintaining cloud-based infrastructure specifically tailored to support our ambitious AI initiatives. This position also calls for individuals with strong leading capability to mentor team members and contribute toward the vision and strategy for Generative AI.
Key Responsibilities:
- Infrastructure Design and Implementation: Design and deploy scalable, reliable, and efficient cloud-based solutions that meet the complex requirements of our AI projects. This involves selecting appropriate Azure services, configuring resources, and optimizing performance to ensure seamless integration with our AI models and applications.
- AI Application Development and Operation: Integrate AI models into production environments using Azure AI Platform tools such as Azure ML, Azure AI Studio, Open AI, Vector Databases, and Databricks. Manage model inventory and artifacts life cycles, ensuring proper documentation, version control, and compliance with industry standards.
- Cloud Services Management: Implement and manage cloud-based services and infrastructure, including compute, storage, access, and networking. Oversee disaster recovery and backup solutions to ensure data integrity and availability.
- Automation and Orchestration: Build automation and orchestration to streamline processes and reduce manual intervention for AI applications. Utilize tools like Terraform, Ansible, or Kubernetes to automate deployment and management processes.
- API Management: Develop and manage APIs for online endpoints for model inferencing, leveraging cloud services to automatically manage infrastructure, scaling, and availability, ensuring efficient and cost-effective model inferencing.
- Collaboration: Partner with engineers to design scalable and resilient platforms that leverage the full potential of cloud services. Cooperate with architects, data scientists, platform engineers, AI security teams, and other key stakeholders to ensure alignment and success in our AI endeavors.
- Security and Compliance: Ensure the security and compliance of cloud-based AI solutions. Collaborate with AI security groups to implement necessary controls and safeguards to protect sensitive data and intellectual property.
- Performance Monitoring and Optimization: Proactively monitor and optimize the performance of AI workloads and cloud resources. Analyze resource utilization, identify cost-saving opportunities, and implement strategies like rightsizing instances, utilizing spot instances, and leveraging auto-scaling capabilities.
- Troubleshooting: Identify and resolve issues related to cloud infrastructure and AI applications, ensuring minimal downtime and optimal performance. Conduct root cause analysis, implement corrective actions, and develop preventive measures to avoid future problems.
- Continuous Improvement: Stay up to date with the latest trends and advancements in cloud computing and AI technologies. Take ownership and share the vision of creative patterns, frameworks and tools that enhance our AI and ML capabilities.
- Technical Guidance and Mentorship: Provide technical expertise and mentorship to team members. Share knowledge and best practices with the team to foster a culture of continuous learning and improvement. Lead project teams and contribute to strategic decision-making processes.
Qualifications:
- Educational Background: Master's degree in Computer Science, Engineering, or a related field. A Ph.D. in ML/AI or analytical sciences is a plus.
- Coding Languages: Proficiency in Python with various coding paradigms; competence in SQL and query optimisation. Familiarity with web stacks such as JavaScript, HTML and CSS is a plus.
- Machine Learning and AI: Deep knowledge in traditional machine learning from statistics to model building and fine tuning; hands on experience in the latest Generative AI modelling, RAG, prompt engineering.
- Big Data Technologies: Experience with big data technologies such as Hadoop, Spark, and other relevant frameworks.
- Data Engineering and ETL Processes: Knowledge of data engineering and ETL processes.
- Cloud Technology: Extensive experience with Azure Cloud Services, including tools like Azure ML, Azure AI Studio, Open AI, Vector Databases, Databricks, Databases, Azure Function, Azure App Services
- Containerization and Orchestration: Significant experience with containerization (Docker) and orchestration tools (Kubernetes).
- Infrastructure as Code (IaC): Expertise with IaC tools like Terraform, Ansible, etc.
- MLOps Practices: In-depth knowledge of CI/CD pipelines and DevOps practices, particularly GitHub Actions.
- Problem-Solving and Collaboration: Excellent problem-solving skills and the ability to work independently and as part of a team. Strong communication skills and the ability to collaborate effectively with cross-functional teams.
- Leadership Skills: Proven leadership experience, with the ability to guide strategic decisions and manage teams effectively.
- Cloud Certifications: Certification in cloud platforms.
- Industry Experience: 10+ years of relevant industry experiences in a similar role within the AI or tech industry.
If you are passionate about leveraging cloud technology to advance AI capabilities and enjoy working in a dynamic, collaborative environment, we would love to hear from you. Join us in our mission to harness the power of AI to drive innovation and business success.