The role entails building a reusable sustainable framework to ensure collection, processing and availability of high-quality health care data to enable us to achieve the core purpose. The Data Engineer will work collaboratively with the Program Managers, Data Scientists, Systems Architects to define data sources and to build a custom data framework that facilitates Machine Learning, AI and productionising AI models based on the principles of ETL/ELT. Together these teams will enable data driven actionable insights.
Core responsibilities include:

  • Work within a highly specialized and growing team to enable delivery of data and advanced analytics system capability.
  • Develop and implement a reusable architecture of data pipelines to make data available for various purposes including Machine Learning (ML), Analytics and Reporting
  • Work collaboratively as part of team engaging with system architects, data scientists and business in a healthcare context
  • Define hardware, tools and software to enable the reusable framework for data sharing and ML model productionization
  • Work comfortably with structured and unstructured data in a variety of different programming languages such as SQL, R, python, Java etc
  • Understanding of distributing programming and advising data scientists on how to optimally structure program code for maximum efficiency
  • Build data solutions that leverage controls to ensure privacy, security, compliance, and data quality
  • Understand meta-data management systems and orchestration architecture in the designing of ML/AI pipelines
  • Deep understanding of cutting-edge cloud technology and frameworks to enable Data Science
  • System integration skills between Business Intelligence and source transactional
    Amplify Health confidential and proprietary information. Not for Distribution. 4
  • Improving overall production landscape as required
  • Define strategies with Data Scientists to monitor models post production
  • Write unit tests and participate in code reviews

What you need to be successful

  • Honours or Master’s degree in BSc Computer Science or Engineering or Software Engineering with solid experience in data mining and machine learning
  • 5 to 15 years of work experience
  • Expert in programming languages such as R, Python, Scala and Java
  • Expert database knowledge in SQL and experience with MS Azure tools such as Data Factory, Synapse Analytics, Data Lake, Databricks, Azure stream analytics and PowerBI
  • Modern Azure datawarehouse skills
  • Expert Unix/Linux admin experience including shell script development
  • Exposure to AI or model development
  • Experience working on large and complex datasets
  • Understanding and application of Big Data and distributed computing principles (Hadoop and MapReduce)
  • ML model optimization skills in a production environment
  • Production environment machine learning and AI
  • DevOps/DataOps and CI/CD experience
  • AWS experience

Desired Skills:

  • R
  • Python
  • Scala
  • Java
  • SQL
  • Ms Azure
  • Data Factory
  • ]
  • Synapse
  • Data Lake
  • Databricks
  • Power BI

Desired Work Experience:

  • 5 to 10 years

Desired Qualification Level:

  • Degree

Learn more/Apply for this position