Type: Contract (12 Months – with option to renew every 12 Months)
Level: Senior
Location: Offices are based in Centurion, but position is/can be remote
Salary: R 75 000 – R100 000 GROSS per month | R 900 000 – R1200 000 GROSS Per Annum includes 15 days holiday, Sick leave
JOB OVERVIEW/ROLE PURPOSE
- This position is responsible for implementing, maintaining, enabling, and facilitating DevOps practices as well as optimizing the architecture and processes of the product and platforms required to meet business goals and objectives.
MINIMUM REQUIREMENTS
Qualifications:
- Degree in IT/ Computer Science or relevant
- Any other related Certification
Experience & Skills
- 4+ years’ experience as a Site Reliability Engineer or similar role as an enabled of DevOps practices.
- 4+ years’ experience as a Software Engineer or Java or Developer Middleware Administrator.
Experience & Technology Requirements:
Programming Languages:
- Python
- Java
- JavaScript
- Go Template Language
Linux:
- Be proficient in shell scripting
- Have a very good understanding of Linux operating systems
- Be able to identify OS level issues and resolve them with minimal down-time
- Be able to identify services running and their network configuration
WAS (WebSphere Application Server):
- Understand the basic operation of the WebSphere Application Server
- Be able to identify fault in particular node
- Be able to view logs via SSH on file mount, as well as via Kibana
Queues:
- Have a good understanding of queuing and queuing systems such as IBM MQ
Jenkins:
- Have a very good understanding of Jenkins
- Be able to find and identify faults with slaves running on remote docker servers
- Be able to find slave SSH access key issues
Ansible:
- Have experience with creating and maintaining Ansible jobs
NginX:
- Understand reverse proxies
- Be able to read the NginX documentation and use it to extend our automated deployments and configuration
- Be able to pull metrics and identify trends and faults from NginX logs in Kibana
- Understand the impact of DNS resolution and NginX upstreams
Consul:
- Understand the concept of a central key-value store
- Understand multi-node single-leader clusters
- Be able to identify server-client communication faults
- Understand service registration
- Understand configuration templates
Docker:
- Have a very good understanding of containerization
- Understand multi-tenant systems and the implications of load balancing across multiple instances
- Be able to find faults in container setup and deployments
- Have a good understanding of volume mounts and layered file systems
Kubernetes:
- Have a good understanding of container orchestration
- Understand cluster DNS
- Have experience with Istio service mesh
- Have a good understanding of namespaces and quotas
- Understand Kubernetes secrets and mounts
- Have experience with log trailing and event monitoring
- Be able to manage an EKS cluster
Networking:
- Know what a CIDR is
- Have a good understanding of general networking
- Be able to identify network faults
- Have a good understanding of firewalls
- Be able to set up and debug AWS Security Groups
- Understand AWS VPCs and subnets
Monitoring:
- Be proficient with KQL and the ElasticSearch DSL
- Be proficient with Prometheus queries and configuration
- Understand Grafana or similar monitoring and alerting tools
- Be proficient with CloudWatch metrics and logs
- Have a good understanding of tracing using tools such as Jaeger
Repositories:
- Have a very good proficiency with Git
- Be proficient with Gitlab administration and Gitlab pipelines
- Understand docker and Maven registries and repositories such as Nexus and Artifactory
Databases:
- Be proficient with MongoDB and MongoDB Ops manager
- Be proficient in SQL
- Have a good understanding of the PostgreSQL DBMS
- Have experience with AWS RDS Aurora PostgreSQL
AWS:
- Understand EC2 features, such as instance types, snapshots, ELB, and EBS
- Be proficient in CloudFormation
- Understanding autoscaling and the cost implications
- Be proficient with creating and deploying AWS Lambda functions
- Understand IAM policies, users, and roles
- Have experience with Route53 and a good understanding of DNS in general
- Understand object storage with S3
Duties:
- Implement and maintain infrastructure required for implementing DevOps practices.
- Enable automated deployment of applications and configurations.
- Enable automated monitoring and alerting.
- Enable automated end-to-end testing.
- Enable continuous release processes, practices, and pipelines.
- Enable change management and audit requirements for release pipelines.
- Interest in designing, analysing, and troubleshooting large-scale distributed systems.
- Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive.
- Ability to debug and optimize code and automate routine tasks.
- Scale systems sustainably through mechanisms such as easy to use tooling and automation.
- Practice sustainable incident response and drive root case analysis.
Personal Attributes:
- Client / stakeholder commitment
- Drive for results
- Leads change and innovation
- Impact and influence
- Self-awareness and insight
- Diversity and inclusiveness
- Collaboration
- Governance
- Strong critical, analytical and research skills
- Desire to teach and mentor others
- Self-motivated, organized, and able to work independently and as part of a team
Desired Skills:
- DevOps
- Site Reliability Engineer
- DevOps practices
- Software Engineering
- Java
- Developer Middleware Administrator
- Python
- JavaScript
- GO Template
- Linux
- Shell Scripting
- Node
- SHH
- Queues
- IBM MQ
- Jenkins
- Slaves Running
- Docker Servers
- Ansible
- NginX
- DNX Resolution
- Consul
- Docker
- Kubernetes
- DNS
- Istio
- Manage EKS Cluster
- Networking
- CIDR
- Understanding Firewalls
- Set up and debug AWS Security Groups
- AWS VPCs
- Monitoring
- KQL and the ElasticSearch DSL
- Prometheus queries and configuration
- Grafana
- CloudWatch
- Jaeger
- GIT
- Gitlab
- Maven
- Databases
- MongoDB and MongoDB Ops manager
- SQL
- Snapshots
- ELB
- EBS
- creating and deploying AWS Lambda functions
- Understand IAM policies
- object storage with S3
- Route53
- Nexus and Artifactory
- PostgreSQL DBMS
Desired Work Experience:
- 1 to 2 years Investments, Insurance & Assurance
- 5 to 10 years Software Development
Desired Qualification Level:
- Degree
About The Employer:
Investments Retail
Employer & Job Benefits:
- 15 days holiday
- Sick leave
- Public Holidays