Job Description
- Responsible for the architectural design and development of Shopee's foundational monitoring platform, including but not limited to the development and optimisation of core components of the monitoring system to support the storage, querying, and operation of massive time-series data.
- Responsible for the platformisation of the monitoring system, ensuring users can utilize various monitoring data reasonably and efficiently.
- Additionally, extract value from fundamental monitoring data to enhance fault detection and diagnosis efficiency for business operations.
- Build an automated operations and maintenance system for the monitoring platform to enhance platform stability capabilities, such as fault recovery and automatic circuit breaking.
- Establish mechanisms for identifying and handling invalid data to optimize the resource costs of the monitoring system.
Requirements
- Bachelor’s degree or higher in Computer Science, Information Technology, Programming & Systems Analysis, Engineering, or other related fields.
- Minimum 5 years of work experience in development-related roles in Linux environments.
- Over 2 years of experience in monitoring systems.
- Possesses strong programming skills, and proficiency in at least one programming language such as Go/Python
- Mastery of at least one common backend web framework (e.g., Django/Flask/Gin) along with its design principles.
- Deep understanding of the Linux operating system, familiar with basic protocols such as TCP/IP, HTTP, and communication frameworks such as Thrift/gRPC/brpc.
- In-depth understanding of large-scale monitoring system architecture and operation solutions.