- Implement and maintain infrastructure required for implementing DevOps practices.
- Enable automated deployment of applications and configurations.
- Enable automated monitoring and alerting.
- Enable automated end-to-end testing.
- Enable continuous release processes, practices and pipelines.
- Enable change management and audit requirements for release pipelines.
- Interest in designing, analyzing and troubleshooting large-scale distributed systems.
- Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive.
- Ability to debug and optimize code and automate routine tasks.
- Scale systems sustainably through mechanisms such as easy to use tooling and automation
- Practice sustainable incident response and drive root case analysis
- Client / stakeholder commitment
- Drive for results
- Leads change and innovation
- Impact and influence
- Self-awareness and insight
- Diversity and inclusiveness
- Strong critical, analytical and research skills
- Desire to teach and mentor others
- Self-motivated, organized and able to work independently and as part of a team
Technology and Skill requirement:
- Linux – Be proficient in shell scripting
- Have a very good understanding of Linux operating systems able to identify OS level issues and resolve them with minimal down-time
- Be able to identify services running and their network configuration 2) WAS – Understand the basic operation of the websphere application server
- Be able to identify fault in particular node – Be able to view logs via ssh on file mount, as well as via Kibana 3)
- Queues – Have a good understanding of queuing and queuing systems such as IBM MQ 4) Jenkins
- Have a very good understanding of Jenkins
- Be able to find and identify faults with slaves running on remote docker servers
- Be able to find slave ssh access key issues 5) Ansible
- Have experience with creating and maintaining Ansible jobs 6) NginX – Understand reverse proxies
- Be able to read the nginx documentation and use it to extend our automated deployments and configuration
- Be able to pull metrics and identify trends and faults from nginx logs in Kibana
- Understand the impact of DNS resolution and nginx upstreams 7) Consul
- Understand the concept of a central key-value store – Understand multi-node single-leader clusters
- Be able to identify server-client communication faults – Understand service registration
- Understand configuration templates 8) Docker
- Have a very good understanding of containerization Understand multi-tenant systems and the implications of load balancing across multiple instances
- Be able to find faults in container setup and deployments – Have a good understanding of volume mounts and layered file systems 9) Kubernetes
- Have a good understanding of container orchestration – Understand cluster DNS
- Have experience with Istio service mesh
- Have a good understanding of namespaces and quotas – Understand kubernetes secrets and mounts
- Have experience with log trailing and event monitoring – Be able to manage an EKS cluster
- 10) Networking
- Know what a CIDR is
- Have a good understanding of general networking
- Be able to identify network faults
- Have a good understanding of firewalls
- Be able to set up and debug AWS Security Groups – Understand AWS VPCs and subnets
- Monitoring – Be proficient with KQL and the ElasticSearch DSL
- Be proficient with Prometheus queries and configuration – Understand Grafana or similar monitoring and alerting tools
- Be proficient with Cloudwatch metrics and logs
- Have a good understanding of tracing using tools such as Jaeger 12) Repositories
- Have a very good proficiency with Git
- Be proficient with Gitlab administration and Gitlab pipelines
- Understand docker and Maven registries and repositories such as Nexus and Artifactory 13) Databases – Be proficient with MongoDB and MongoDB Ops manager – Be proficient in SQL
- Have a good understanding of the PostgresQL DBMS
- Have experience with AWS RDS Aurora PostgresQL 14) AWS – Understand EC2 features, such as instance types, snapshots, ELB, and EBS – Be proficient in Cloudformation
- Understanding autoscaling and the cost implications
- Be proficient with creating and deploying AWS Lambda functions
- Understand IAM policies, users and roles – Have experience with Route53 and a good understanding of DNS in general
- Relavant IT degree/diploma/certification
- 4+ years of experience as a Site Reliability Engineer or similar role as an enabled of DevOps practices.
- 4+ years of experience as a Software Engineer or Java or Developer Middleware administrator
Desired Work Experience:
- 2 to 5 years Financial Advisory & Consulting Service
- 2 to 5 years Software Development
Desired Qualification Level:
About The Employer:
An Insurance and financial based industry company located in Centurion.