· Perform work in shifts to provide 24/7 on-site or on-call support.
· Incident and Problem management.
· Should have knowledge on SRE Best practices and able to adhere to SRE guidelines in the work.
· Provide root cause analysis techniques to determine cause and resolve complex system issues.
· Perform post-resolution follow-ups to ensure problems have been adequately resolved.
· Communicate application problems and issues to key stakeholders, including management, development teams, end users, and unit leaders.
· Work with onsite and offshore teams across multiple technologies/applications
· Continuous improvement of the system, eq. removal of TOIL, job automation, performance tuning.
· Proactive management of production services by measuring and monitoring availability, latency, throughput, user journeys and overall system health.