This position is responsible for implementing, maintaining, enabling and facilitating DevOps practices as well as optimizing the architecture and processes of the product and platforms required to meet business goals and objectives.

  • Implement and maintain infrastructure required for implementing DevOps
  • Enable automated deployment of applications and
  • Enable automated monitoring and
  • Enable automated end-to-end
  • Enable continuous release processes, practices and
  • Enable change management and audit requirements for release
  • Interest in designing, analyzing and troubleshooting large-scale distributed
  • Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and
  • Ability to debug and optimize code and automate routine
  • Scale systems sustainably through mechanisms such as easy to use tooling and automation
  • Practice sustainable incident response and drive root case analysis

Competencies Required

  • Client / stakeholder commitment
  • Drive for results
  • Leads change and innovation
  • Impact and influence
  • Self-awareness and insight
  • Diversity and inclusiveness
  • Collaboration
  • Governance
  • Strong critical, analytical and research skills
  • Desire to teach and mentor others
  • Self-motivated, organized and able to work independently and as part of a team
  • Linux
  • Be proficient in shell scripting
  • Have a very good understanding of Linux operating systems
  • Be able to identify OS level issues and resolve them with minimal down-time
  • Be able to identify services running and their network configuration
  • WAS
  • Understand the basic operation of the websphere application server
  • Be able to identify fault in particular node
  • Be able to view logs via ssh on file mount, as well as via Kibana
  • Queues
  • Have a good understanding of queuing and queuing systems such as IBM MQ
  • Jenkins
  • Have a very good understanding of Jenkins
  • Be able to find and identify faults with slaves running on remote docker servers
  • Be able to find slave ssh access key issues
  • Ansible
  • Have experience with creating and maintaining Ansible jobs
  • NginX
  • Understand reverse proxies
  • Be able to read the nginx documentation and use it to extend our automated deployments and configuration
  • Be able to pull metrics and identify trends and faults from nginx logs in Kibana
  • Understand the impact of DNS resolution and nginx upstreams
  • Consul
  • Understand the concept of a central key-value store
  • Understand multi-node single-leader clusters
  • Be able to identify server-client communication faults
  • Understand service registration
  • Understand configuration templates
  • Docker
  • Have a very good understanding of containerization
  • Understand multi-tenant systems and the implications of load balancing across multiple instances
  • Be able to find faults in container setup and deployments
  • Have a good understanding of volume mounts and layered file systems
  • Kubernetes
  • Have a good understanding of container orchestration
  • Understand cluster DNS
  • Have experience with Istio service mesh
  • Have a good understanding of namespaces and quotas
  • Understand kubernetes secrets and mounts
  • Have experience with log trailing and event monitoring
  • Be able to manage an EKS cluster
  • Networking
  • Know what a CIDR is
  • Have a good understanding of general networking
  • Be able to identify network faults
  • Have a good understanding of firewalls
  • Be able to set up and debug AWS Security Groups
  • Understand AWS VPCs and subnets
  • 11) Monitoring- Be proficient with KQL and the ElasticSearch DSL- Be proficient with Prometheus queries and configuration- Understand Grafana or similar monitoring and alerting tools- Be proficient with Cloudwatch metrics and logs- Have a good understanding of tracing using tools such as Jaeger 12) Repositories- Have a very good proficiency with Git- Be proficient with Gitlab administration and Gitlab pipelines- Understand docker and Maven registries and repositories such as Nexus and Artifactory 13) Databases- Be proficient with MongoDB and MongoDB Ops manager- Be proficient in SQL- Have a good understanding of the PostgresQL DBMS- Have experience with AWS RDS Aurora PostgresQL 14) AWS- Understand EC2 features, such as instance types, snapshots, ELB, and EBS- Be proficient in Cloudformation- Understanding autoscaling and the cost implications- Be proficient with creating and deploying AWS Lambda functions- Understand IAM policies, users and roles- Have experience with Route53 and a good understanding of DNS in general- Understand object storage with S3 15) Programming Languages- Python- Java- Javascript- Go Template Language
  • Qualifications and Experience – Relavant IT degree/diploma/certification- 4+ years of experience as a Site Reliability Engineer or similar role as an enabled of DevOps practices.- 4+ years of experience as a Software Engineer or Java or Developer Middlewareadministrator.

Desired Skills:

  • DevOps Engineering
  • Python
  • JAVA
  • JavaScript
  • AWS
  • CLOUD
  • GIT

About The Employer:

This position is responsible for implementing, maintaining, enabling and facilitating DevOps practices as well as optimizing the architecture and processes of the product and platforms required to meet business goals and objectives.

– Client / stakeholder commitment
– Drive for results
– Leads change and innovation
– Impact and influence
– Self-awareness and insight
– Diversity and inclusiveness
– Collaboration
– Governance
– Strong critical, analytical and research skills
– Desire to teach and mentor others
– Self-motivated, organized and able to work independently and as part of a team

Learn more/Apply for this position