TFG Labs is searching for a Site Reliability Engineer to join our newly formed team responsible for designing and transforming TFG into the leading Omni-channel retailer in Africa.
Are you interested in solving difficult puzzles that emerge when running a fast-moving, constantly evolving, large-scale infrastructure? Does reaching into all parts of a complex framework-from web servers and databases to continuous integration systems to the AWS/GCP cloud-excite you?
SRE Engineer responsibilities include helping to design, expand and maintain our infrastructure, engaging with the rest of the tech team to arrive at solutions that help them perform better, deploying product updates, identifying production issues and implementing integrations that meet the business needs. Ultimately, you will execute and automate operational processes fast, accurately and securely.
If you have a solid background in software engineering and have a good experience level with Terraform, Helm, Kustomize, Kubernetes, Networking and Google Cloud, we would like to meet you.
What you will do:
- Collaborate with other engineering team members to deliver an awesome experience for our customers and our business
- Diagnose/troubleshoot problems in a complex service made up on many distributed components
- Build and grow our technical infrastructure and support the decision making with regards to the direction we should go
- Participate in rotating coverage to handle issues raised by monitoring systems and the TFG labs team
- Develop, maintain, and configure software to automate processes and improve efficiency
- Integrate open-source, commercially distributed, and custom-developed code to produce the tools needed to support our service delivery and software-development goals
- Testing and optimizing systems to create a stable operational environment
- Knowledge of SRE best practices, i.e., developing tools that enhance the system performance, reliability and engineer experience
- Working with software engineers and software engineers to ensure that development follows established processes and works as intended
- Planning out projects and being involved in project management decisions
What you need to be successful:
- Degrees in BSc / equivalent experience
- In-depth knowledge of the building blocks of high-performance web service systems: distributed computing, databases, networking, content-distribution networks, security, etc.
- An endless curiosity about how things work
- Experience with Linux; we use Ubuntu, but transferable skills from other Linux flavors work, too
- Experience with AWS services-EC2, S3, SQS, AutoScaling, ELB, R53, Fargate, Lambda-from console to CLI to API
- Familiarity with databases such as MySQL
- 3+ years’ experience in Cloud architecture and Cloud design across multiple cloud platforms in a large-scale environment
- Cloud and automation engineering experience in a mission-critical environment
- Strong experience and understanding in a cloud platform, preferably AWS
- Ability to design, implement and document architectures and solutions using a mix of IaaS / PaaS / SaaS, DevOps and with a strong focus on automation, internal compliance, monitoring, documentation and cybersecurity
- Ability to design serverless architecture and create automated deployments
- Experience with server-side languages such as Python, Node.js, Java, Golang and PHP would be advantageous
- Proficiency in Bash, Makefiles and one or more scripting languages such as Python or NodeJS
- Deep technical knowledge in automation tools such as Terraform
- Experience with Ansible and/or other similar tools is advantageous
- Strong skills and experience through the CI/CD stack that deploys to infrastructure elements such as Kubernetes
- Knowledge in Networking Services
- A strong knowledge and understanding of data migration practices and technologies
- Experience with NoSQL, RDBMS, Graph DB, KV and column store are advantageous
* Some of these skills and experiences are preferred and are not strict requirements. For our senior roles we look for broad experiences mixed with a variety of specialty expertise.
- Excellent communication skills and a passion to grow together with colleagues
- An ability to work as part of a team but also in a self-directed manner when a task requires an independent, autonomous approach
- Think independently, discuss open-mindedly and assertively, and value getting it right over being right, and as a result you generate rapid improvements in yourself and the organization
- Value doing meaningful work and build meaningful relationships
- High sense of personal accountability and ownership to do the right things, even when difficult
- An internal drive for excellence
- Strong problem-solving skills
- Strong attention to detail
- Awareness of DevOps and Agile principles, values and processes
- Ability to set your ego aside and assess yourself candidly
Preference will be given, but not limited to candidates from designated groups in terms of the Employment Equity Act.