About the Role
As a business SRE, you'll manage the technical operations of Shopee's core marketplace businesses, including product lines such as Shopee voucher management, Shopee discount/coins management, Shopee selling listing online, Shopee intelligence and data, and more. Our goal is to construct and sustain vast, robust, and highly efficient distributed systems, striving to maximize system availability and performance while minimizing costs. Consequently, you will not only contribute to the development of multiple full-stack platforms and solutions but also create your own. This role will frequently expose you to challenges in both technical operations and software engineering. Your involvement will require a deep dive into Shopee's development and business operations cycle to ensure scalability even in the face of rapid system evolution. Your responsibilities will span every aspect, from designing business development to optimizing data centers, networks, and operating systems.
Responsibilities
- Continuously improve the marketplace services in the private cloud, including but not limited to stress test automation, capacity management, service autoscaler, disaster recovery, chat operations, knowledge base management, SOP automation, dynamic service protection, etc.
- Administer and maintain the servers of marketplace services and all the dependent middlewares.
- Deep dive into Marketplace core product lines, and setup and run proof of concepts to optimize the services running in private cloud.
- Ensure reliability of Shopee Marketplace all year round, and through all campaigns.
- Fun and energetic team culture with strong emphasis on learning, sharing and growth.
- Wide exposure to enable rapid growth in personal skills and career.
- 50:50 time spent between technical operations and software engineering.
Requirements
- Bachelor's degree or higher in Bachelor's degree or higher in Statistics, Mathematics, Computer Science, Information Technology, Programming & Systems Analysis, Engineering, or other related disciplines.
- Minimum 3 years of work experience as a site reliability engineer.
- Experience with site reliability engineering concepts and tools.
- Experience with monitoring tools like Prometheus, Zabbix, Grafana, etc.
- Experience with load balancing tools like LVS, Nginx, OpenResty, HAProxy, etc.
- Experience with container technology such as Docker, Kubernetes, etc.
- Experience with load testing, capacity management, and campaign preparation.
- Good computer science fundamentals: data structures and algorithms, operating systems, computer networking / security, virtualization, containerization, etc.
- Good software engineering and application architecture skills: backend / frontend development, architecture and design patterns, middlewares including cache, database, queues, file storage, etc.
- Individual traits that we are looking for: fast learning ability and a good team player, strong analytical and problem-solving skills, ability to adapt and thrive in a dynamic work environment, passionate and possessing a strong sense of ownership.