• Perform work in shifts to provide 24/7 on-site or on-call support.
• Incident and Problem management.
• Should have knowledge on SRE Best practices and able to adhere to SRE guidelines in the work.
• Provide root cause analysis techniques to determine cause and resolve complex system issues.
• Perform post-resolution follow-ups to ensure problems have been adequately resolved.
• Communicate application problems and issues to key stakeholders, including management, development teams, end users, and unit leaders.
• Work with onsite and offshore teams across multiple technologies/applications
• Continuous improvement of the system, eq. removal of TOIL, job automation, performance tuning.
• Proactive management of production services by measuring and monitoring availability, latency, throughput, user journeys and overall system