- Analyse and locate root causes, recognize, and address systemic factors, and diagnose and mitigate weaknesses before they become disruptive.
- Respond to alerts and service requests, resolving incidents to ensure system uptime and expected service levels.
- Deep dives into stability issues, providing solutions to promote system stability and optimization thereof.
- Provide operational support on a rotating, on-call schedule as part of an Operations team.
- Building and setting up new development tools, infrastructure, and cloud services
- Working with software developers and software engineers to ensure that development follows established processes and works as intended.
- Increase coverage of monitoring and alerting capability.
- Trend analysis and real time analysis resulting in production solutions for increased system and functional stability.
- Investigate, analyse, and document production incidents, according to SLA agreement and urgency.
- Escalation of incidents to 3rd line support, if needed (squad, developers, Infra, Network, DB Admins). Work closely with all parties necessary in order to solve problems.
- Create and maintain an up-to-date documentation and procedures of your area of support.
- Ability to effectively interface with technical and nontechnical staff at all organizational levels.
- Excellent problem solving/analytical skills and knowledge of analytical tools.
- Logging all incidents accurately and documenting all investigative activities, including all technical means employed to ascertain the nature of the fault and remedial action taken. • Build and maintain monitoring dashboards and alerts to ensure production and system uptime for all systems both on premises and in the cloud
- Review security alerts to decide relevancy and urgency of potential threats and take appropriate action to mitigate risk.
- Run vulnerability scans and review vulnerability assessment reports to assess, address and report vulnerabilities to the development teams.
- Manage and configure security monitoring tools (net flows, IDS, correlation rules, etc.) to ensure optimal use and coverage.
- Monitor security access to identify potential risk and address with appropriate actions.
- Define access privileges, control structures and resources to protect systems.
- Contribute to the development and maintenance of security policies, procedures, standards, and awareness, by providing data insights through analysis.
- Ensuring that systems are safe and secure against cybersecurity threats
Minimum Requirements:
- Minimum 1 object-oriented and 1 scripting language (Python PowerShell, Bash and .NET)
- Familiarity using Docker, Kubernetes & Helm
- Knowledge of Cloud network topologies and configuration techniques such as VLANs, VPNs or VNETs
- Comprehensive knowledge of network protocols and services such as TCP/IP, DNS, and DHCP
- Online version control systems (Subversion, GitHub, Bitbucket)
- An understanding of Azure Cloud infrastructure
- Comfortable working with a small team in a fastpaced environment
- Configuration management and containerization tools
- Common data stores, both relational and NoSQL
- Data integrity, security, and continuity of business
Educational Requirements
• 3 years IT Software & System experience, with a focus or on IT Operations & DevSecOps
• Good understanding of Unix & Windows Operating systems
• Experience with SSL & TLS
• Proven experience in monitoring capabilities
• Strong understanding of middleware technologies
• Cloud Certifications are an advantage.
• Experience migrating on-prem services to cloud.
• Experience working in an AGILE environment with experience in agile based tooling
Desired Skills:
- Python
- Unix/Linux
- Cloud network
- Data integrity