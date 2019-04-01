Data Engineer

JOB PURPOSE:

Data Engineers build and support data pipelines and datamarts built off those pipelines. Both must be scalable repeatable and secure. They help facilitating getting data from a variety of different sources, in the correct format, assuring that it conform to data quality standards and assuring that downstream users can get to that data timeously. This role functions as a core member of an agile team.

These professionals are responsible for the infrastructure that provide insights from raw data, handling and integrating diverse sources of data seamlessly. They enable solution handling large volumes of data in batch and realtime leveraging emerging technologies from both the big data and cloud spaces. Additional responsibilities include developing proof of concepts and implement complex big data solution with a focus on collecting, parsing, managing, analysing and visualising large datasets. They know how to apply technologies to solve the problems of working with large volumes of data in diverse formats to deliver innovative solutions.

Data Engineering is a technical job that requires substantial expertise in a broad range of software development and programming fields. These professional have a knowledge of data analysis, end user requirements and business requirements analysis to develop a clear understanding of the business need and to incorporate these needs into a technical solution. They have a solid understanding of physical database design and the systems development lifecycle. This role must work well in a team environment.

EXPERIENCE AND QUALIFICATIONS:

– 3 year IT related degree or diploma

– AWS Certification (at least to associate level)

– 4+ years Business Intelligence

– 4+ years Extract Transform and Load (ETL) processes

– 2+ years Cloud AWS

– 2+ years Agile exposure (Kanban or Scrum)

– 2+ years Creating data feeds from on-premise to AWS Cloud

– 2+ years Support data feeds in production on break fix basis

– 4+ years Creating data marts using Talend or similar ETL development tool

– 2+ years Manipulating data using python and pyspark

– 2+ years Processing data using the Hadoop paradigm particularly using EMR, AWS’s distribution of Hadoop

– 2+ years Devop for Big Data and Business Intelligence including automated testing and deployment

– 1+ year Talend

– 1+ year AWS: EMR, EC2, S3

– 1+ year Python

– 1+ year Business Intelligence data modelling

– 3+ years SQL

WHAT YOU’LL DO:

– Design and develop data feeds from an on-premise environment into a datalake environment in an AWS cloud environment.

– Design and develop programmatic transformations of the to correctly partition it, format it and validate or correct it data quality.

– Design and develop programmatic transformation, combinations and calculations to populate complex datamarts based on feed from the datalake.

– Provide operational support to datamart datafeeds and datamarts.

– Design infrastructure required to develop and operate datalake data feeds.

– Design infrastructure required to develop and operate datamarts, their user interfaces and the feeds required to populated them.

