Data Engineer - Gauteng Johannesburg

The role entails building a reusable sustainable framework to ensure collection, processing and availability of high-quality health care data to enable us to achieve the core purpose. The Data Engineer will work collaboratively with the Program Managers, Data Scientists, Systems Architects to define data sources and to build a custom data framework that facilitates Machine Learning, AI and productionising AI models based on the principles of ETL/ELT. Together these teams will enable data driven actionable insights.
Core responsibilities include:

Work within a highly specialized and growing team to enable delivery of data and advanced analytics system capability.
Develop and implement a reusable architecture of data pipelines to make data available for various purposes including Machine Learning (ML), Analytics and Reporting
Work collaboratively as part of team engaging with system architects, data scientists and business in a healthcare context
Define hardware, tools and software to enable the reusable framework for data sharing and ML model productionization
Work comfortably with structured and unstructured data in a variety of different programming languages such as SQL, R, python, Java etc
Understanding of distributing programming and advising data scientists on how to optimally structure program code for maximum efficiency
Build data solutions that leverage controls to ensure privacy, security, compliance, and data quality
Understand meta-data management systems and orchestration architecture in the designing of ML/AI pipelines
Deep understanding of cutting-edge cloud technology and frameworks to enable Data Science
System integration skills between Business Intelligence and source transactional
Amplify Health confidential and proprietary information. Not for Distribution. 4
Improving overall production landscape as required
Define strategies with Data Scientists to monitor models post production
Write unit tests and participate in code reviews

What you need to be successful

Honours or Master’s degree in BSc Computer Science or Engineering or Software Engineering with solid experience in data mining and machine learning
5 to 15 years of work experience
Expert in programming languages such as R, Python, Scala and Java
Expert database knowledge in SQL and experience with MS Azure tools such as Data Factory, Synapse Analytics, Data Lake, Databricks, Azure stream analytics and PowerBI
Modern Azure datawarehouse skills
Expert Unix/Linux admin experience including shell script development
Exposure to AI or model development
Experience working on large and complex datasets
Understanding and application of Big Data and distributed computing principles (Hadoop and MapReduce)
ML model optimization skills in a production environment
Production environment machine learning and AI
DevOps/DataOps and CI/CD experience
AWS experience

Desired Skills:

R
Python
Scala
Java
SQL
Ms Azure
Data Factory
]
Synapse
Data Lake
Databricks
Power BI

Desired Work Experience:

5 to 10 years

Desired Qualification Level:

Degree

Learn more/Apply for this position

Data Engineer – Gauteng Johannesburg