Why Join Us?
OCI (Oracle Cloud Infrastructure) AI Infrastructure is at the forefront of building a cutting-edge, ultra-high-performance GPU platform designed to support AI/ML/HPC workloads. The CoE provides a platform to be part of the AI revolution, architecting customer centric systems and solve real-world business problems with AI
Requirements - Technical
You bring your proven experience in 3 or more of the following areas. AI Infrastructure experience and experience with LLMs is a MUST
AI Infrastructure Design: Lead the architecture and implementation of AI and HPC infrastructure, including the use of GPUs/TPUs, high-performance networking, and scalable storage solutions to support GenAI/AI/ML workloads
AI Deployment: Experience in deploying large models in production on public clouds (OCI, AWS, Azure, GCP) and hybrid cloud environments, including the use of microservices and containerization (Docker, Kubernetes) to ensure smooth deployment, scaling, and monitoring of AI/ML models in production
AI/ML Tools & Frameworks: Design and implement AI systems using industry-standard training, inferencing and deployment tools such as Kubeflow, Ray, CUDA, PyTorch, and TensorFlow, ensuring optimal performance in training and deployment. Exposure to scheduling and automation tools such as Slurm, Terraform is desirable
Large Language Models (LLMs): Expertise in working with closed and/or open-source LLMs (e.g., GPT, BERT, Bloom, LLaMA) and understanding the full AI life cycle, including training, fine-tuning, and deploying these models for inference in production environments.
Performance Optimization: Drive the optimization of AI infrastructure and applications on Oracle OCI, focusing on efficiency improvements in computational speed and resource management.
Security & Compliance: Ensure all AI infrastructure and solutions are compliant with industry standards and organizational policies related to security, privacy, and data governance
Operating Systems, Protocols and Tools: Strong Linux skills with hands-on experience in Oracle Linux/RHEL/CentOS, Ubuntu, and Debian distributions, including system administration, package management, shell scripting. Strong knowledge of networking protocols (TCP/IP, Infiniband, RDMA, UDP, HTTP) is a significant advantage. Experience on high performance storage is desirable