DevOps Engineer (AI and Services) - Gauteng Midrand

DevOps Engineer (AI and Services)
Location: Midrand, Gauteng, South Africa
Employment Type: Full-time and Office-based
Reporting Line: General Manager – AI and Services
Contact: Chanel Lubbe – Associate Talent Specialist ([Email Address Removed])

Job Purpose
The DevOps Engineer will be responsible for deploying, managing, and optimizing the AI software stack to support our AI-driven applications. This role involves close collaboration with data scientists and machine learning engineers to ensure the seamless integration of AI models and services within our enterprise environment.
Key Responsibilities

Design, implement, and manage on-premises and hybrid infrastructure for AI solutions.
Leverage tools like NVIDIA AI Enterprise to streamline containerized application deployments and manage GPU resources effectively.
Automate deployment and configuration of AI software solutions, ensuring that machine learning models and AI frameworks (e.g., TensorFlow, PyTorch) are optimized for performance.
Develop scripts and tools in Python to facilitate rapid deployment of AI applications across various environments.
Implement CI/CD pipelines specifically tailored for AI workloads using tools like cuDNN, Jenkins, GitLab CI, or CircleCI.
Collaborate with data science teams to ensure efficient model versioning and deployment strategies.
Establish monitoring solutions to track the performance and utilization of AI resources and systems for efficient and reliable operations.
Analyze system performance, identify bottlenecks, and implement tuning strategies for optimal GPU and application performance.
Work closely with cross-functional teams, including data scientists, ML engineers, and IT, to support the integration of AI solutions into business applications.
Provide guidance and support for best practices in AI model training and deployment, ensuring effective use of AI tools and solutions.
Implement security measures and best practices to safeguard data and AI models.
Ensure compliance with relevant data protection regulations and industry standards.
Create and maintain comprehensive documentation related to infrastructure setups, deployment processes, and operational guidelines.
Conduct training sessions for team members on AI tools, platforms, DevOps practices, and workflows.

Requirements
Experience and Knowledge

3+ years of experience in a DevOps role, with a focus on automation, AI, machine learning, or data engineering.
Hands-on experience with NVIDIA AI Enterprise software is an advantage.
Experience with technologies like Service Fabric, Redis, Rancher, ASP.NET, .Net Core, RabbitMQ, Elastic Stack, Git, API, and Terraform is beneficial.
Knowledge of AI platforms (e.g., Nvidia, Intel, OpenShift AI, Kubernetes), AI models (e.g., HuggingFace, Nvidia, Lama, GPT), and AI infrastructure (e.g., DELL AI Factory, SuperMicro, Nutanix AI).

Skills and Education

Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent experience.
Proficiency in Python and other scripting languages (Bash) for automation and tool development.
Familiarity with containerization technologies (Docker, Kubernetes, Rancher) as they relate to AI workloads.
Understanding of machine learning frameworks (TensorFlow, PyTorch) and their deployment.
Knowledge of coding/scripting languages such as Python, JavaScript, Yaml, Json, Terraform, and Ansible.
Understanding of messaging protocols, APIs, SDKs, and open-source databases.
Fundamental understanding of networking concepts like TCP/IP, DNS, TLS, and load balancing.
Strong analytical and problem-solving skills with a keen attention to detail.
Excellent communication and teamwork abilities, with a collaborative mindset.
Ability to adapt to a fast-paced environment and manage multiple priorities effectively.

Desired Skills:

Python
DevOps
Machine Learning
Nvidia
Kubernetes
SuperMirco

Learn more/Apply for this position

DevOps Engineer (AI and Services) – Gauteng Midrand