Specifications:
- Architect, develop, and maintain our core infrastructure and services on AWS, focusing on high availability, performance, and scalability. Specific AWS services of interest include EC2, RDS, S3, ElastiCache, CloudWatch, RedShift, OpenSearch, and VPC.
- Implement and manage continuous deployment processes to achieve seamless deployment of services with minimal downtime.
- Monitor system performance, identify bottlenecks, and apply necessary optimizations to ensure the smooth operation of our services.
- Develop and maintain automated tools for infrastructure provisioning, configuration, and deployment.
- Work closely with development teams to integrate infrastructure builds and operational best practices into the software development lifecycle.
- Conduct root cause analysis for production errors and implement strategies to prevent future occurrences.
- Manage and optimize network configurations to ensure secure and efficient data flow and access.
- Administer and maintain databases, ensuring their reliability, performance, and security.
- Lead capacity planning efforts to ensure that our infrastructure scales in line with demand while optimizing costs and maintaining performance.
Qualifications:
- Bachelor's degree in Computer Science, Engineering, or a related field.
- 3+ years service reliability/operational experience running large scale, high performance systems & internet services
- Proven experience as an SRE, DevOps Engineer, or similar role in a cloud-based environment.
- Strong expertise in AWS services and tools.
- Proficient understanding of networking principles, transport, and application protocols, especially TCP/IP, BGP, DNS, TLS, and HTTP/S.
- Experience with database administration, including performance tuning, backup and recovery processes, and security management.
- Proficiency in scripting languages (e.g., Python, Bash) and automation tools (e.g., Terraform, Saltstack).
- Excellent problem-solving skills and the ability to work independently or as part of a team.
- Strong communication skills and the ability to collaborate effectively with cross-functional teams.
- Preferred - Significant experience in capacity planning and cost management within cloud environments, particularly using Spotinst for cost optimization.
- Preferred - Candidates with AWS SysOps Administrator Associate or AWS Solutions Architect Professional (SAP) certification.
- Fluency in Chinese (reading and writing) would be an additional advantage.