Citadel’s Site Reliability Engineers (SRE) work to bring the practices of SRE to the financial trading field, bringing innovation and cutting-edge technology to reduce complexity and improve performance. SREs are responsible for taking applications to production, providing early support for applications in development, and ensuring crisp application function throughout their lifetime. SREs will have a deep understanding of how applications function and be able to change applications for production quality.
Individuals in SRE will work closely with application development teams in PTE on automation and application refactoring. In some instances, the SRE and app dev teams will move to allow maximum cross-pollination.
Depending on the situation, the SRE team may be designing and deploying a new generation of production infrastructure.
This role will primarily involve working with a cohesive team of engineers, developers, and trade support teams to build world-class systems and the necessary tools to maintain and constantly improve them. The position calls for someone willing to innovate, automate, and continuously use measurements and statistics to improve. Team members have backgrounds in various areas, including software development, networking, UNIX internals, and large-scale systems administration.
Responsibilities:
· Manage and provide technical and non-technical guidance and support for the growth of the SREs on the team and engineers across other teams
· Understanding SRE principles, including monitoring, alerting, error budgets, fault analysis, and other common reliability engineering concepts, with a keen eye for opportunities to eliminate toil by code and process improvements.
· Ensure the reliability, availability, and performance of applications
· Own the automation of repetitive tasks and resolution of systematic issues
· Identify and deliver engineering solutions for issues based on root cause analysis
· Own incident management and resolution
· Lead by evangelizing the SRE mindset to other teams
· Provide support and ensure applications are production-ready
· Working across geo-distributed teams.
Qualifications:
· Strong background in computer science fundamentals, data structures, algorithms, distributed systems
· Fully proficient in at least one modern structured programming language (Python or Java)
· Experience in building and leading engineering teams, ideally SRE or Production Engineering
· Comfortable with a range of current software development tools and practices (testing, source control, build systems, CI/CD, etc)
· A basic working understanding of TCP/IP networking, LAN, and WAN, as well as Linux internals
· Experience with building and managing highly reliable large-scale systems a plus
· Excellent written and verbal communication skills
· Strong entrepreneurial spirit
· A passion for learning, adapting to changing requirements and technology, and inventing new approaches to complex problems
Education:
Bachelor’s or Masters in Computer Engineering / Computer Science or an allied field.