Assume a critical role in defining the future of a globally recognized firm and have a direct and significant effect in a realm tailored for top achievers in site reliability.
As a Lead Site Reliability Engineer at JPMorgan Chase within the Corporate Technology, Trade Surveillance, you hold a leadership role in your team, demonstrate strong knowledge across multiple technical domains, and advise others on the technical and business issues facing them. Take lead and conduct resiliency design reviews, break up complex problems into digestible work for other engineers, act as a technical lead for medium to large-sized products, and provide advice and mentoring to other engineers.
Responsibilities
• Supports applications in the Compliance Technology portfolio with a mixture of batch (3000 jobs/day), and UI availability and data queries.
• Demonstrates and champion site reliability culture and practices and exert technical influence throughout your team.
• Leads initiatives to improve the reliability and stability of your team’s applications and platforms using data-driven analytics to improve service levels.
• Collaborates with your team to identify comprehensive service level indicators and stakeholders to establish reasonable service level objectives and error budgets with your customers.
• Demonstrates a high level of technical expertise within one or more technical domains and proactively identifies and solves technology-related bottlenecks in your areas of expertise
• Acts as the main point of contact during major incidents for your application and demonstrates the skills to identify and solve issues quickly to avoid financial losses
• Collaborates with other software engineers and software engineering teams to design and implement deployment approaches using automated continuous integration and continuous delivery pipelines.
Qualifications and technical skills
• Bachelor’s degree in computer science or related fields
• Formal training or certification on software engineering concepts and 5 + years as a Site reliability engineer.
• Deeply proficient in reliability, scalability, performance, security, enterprise system architecture, toil reduction, and other site reliability best practices. You can demonstrate how to implement these practices within an application or platform.
• Proficiency and experience of software applications and technical processes within a technical discipline (e.g., Cloud). Proficient with container and orchestration: (ECS, Kubernetes, Docker).
• Proficiency and experience in monitoring and observability tools like Apica, Splunk,SLO, Alerting and telemetry collection using tools such as Grafana, Dynatrace, Datadog, CloudWatch etc.
• Proficiency and experience with continuous integration and continuous delivery tools like Jenkins, GitHub, Terraform.
• Experience in troubleshooting the issues and provide root cause analysis when needed.
• Ability to participate in active technical issues, provide technical solutions. Plan and execute on resiliency tests.
• Drive Toil reductions by using coding skills to solve our day-to-day issues.
Preferred qualifications, capabilities, and skills
• Good Database Knowledge on Oracle, Scheduling knowledge (Autosys/Controlm)
• Fluent in at least one programming language such as: Python, Java etc.
• Weekend support (rotation basis)
To apply for this position, please use the following URL:
https://ars2.equest.com/?response_id=53d44aa8f69021864d1524dc0de3db7f