Location: Singapore/Malaysia
Job Description: We are seeking an experienced software engineer to manage system monitoring and operations for both internal and external projects, with a particular focus on setting up and maintaining the Nagios system. This role will work closely with development and operations teams to ensure system availability and precise monitoring.
Responsibilities:
• Set up, configure, and optimize the Nagios monitoring system • Develop and maintain custom monitoring plugins to meet business needs
• Monitor networks, servers, and applications in real time to ensure system stability
• Analyze and troubleshoot system performance issues and provide long-term optimization recommendations
• Write and maintain documentation for system architecture, monitoring configurations, and troubleshooting guides
• Collaborate with the team to optimize monitoring frameworks and enhance system automation
• Support development and operations teams with system failure response and troubleshooting
Requirements:
• At least 3 years of experience in system operations or software development
• Proficient in setting up, configuring, and maintaining Nagios monitoring systems
• Familiar with Linux/Unix environments
• Experience with network protocols (e.g., TCP/IP, HTTP, DNS) and network device monitoring
• Ability to write monitoring plugins using Python, Bash, or other scripting languages
• Knowledge of server virtualization and cloud environment monitoring solutions is a plus
• Strong communication skills and ability to work collaboratively within a team
• Excellent problem-solving skills and ability to work under pressure
Preferred Qualifications:
• Experience with other monitoring tools such as Zabbix, Prometheus, etc.
• Experience in monitoring and maintaining large-scale system architectures
• Familiarity with automation tools like Ansible, Puppet, or Chef
We Offer:
• Competitive salary and benefits package
• Continuous career development and training opportunities
• A dynamic and innovative work environment