Position Purpose:
- Data Engineers build and support data pipelines and datamarts built off those pipelines. Both must be scalable, repeatable, and secure.
- The Data Engineer helps to facilitate gathering data from a variety of different sources, in the correct format, assuring that it conforms to data quality standards and assuring that downstream user can get to that data timeously.
- This role functions as a core member of an agile team.
- These professionals are responsible for the infrastructure that provides insights from raw data, handling and integrating diverse sources of data seamlessly.
- They enable solutions, by handling large volumes of data in batch and real-time by leveraging emerging technologies from both the big data and cloud spaces. Additional responsibilities include developing proof of concepts and implements complex big data solutions with a focus on collecting, parsing, managing, analyzing, and visualizing large datasets.
- They know how to apply technologies to solve the problems of working with large volumes of data in diverse formats to deliver innovative solutions.
- Data Engineering is a technical job that requires substantial expertise in a broad range of software development and programming fields.
- These professionals have a knowledge of data analysis, end user requirements and business requirements analysis to develop a clear understanding of the business need and to incorporate these needs into a technical solution.
- They have a solid understanding of physical database design and the systems development lifecycle.
- This role must work well in a team environment.
Qualifications:
- IT Degree/Diploma (3 years)
Desirable:
- AWS Certification at least to associate level
Experience:
- Business Intelligence (3-5 years)
- Extract Transform and Load (ETL) processes (3-5 years)
- Agile exposure, Kanban, or Scrum (2+ years)
Desirable:
- Retail Operations (3-5 years)
- Big Data (1+ years)
- Cloud AWS (1+ years)
Job objectives:
- Design and develop data feeds from an on-premises environment into a datalake environment in an AWS cloud environment
- Design and develop programmatic transformations of the solution, by correctly partitioning, formatting and validating the data quality
- Design and develop programmatic transformation, combinations, and calculations to populate complex DataMart’s based on feed from the datalike
- Provide operational support to DataMart data feeds and datamarts
- Design infrastructure required to develop and operate datalake data feeds
- Design infrastructure required to develop and operate datamarts, their user interfaces and the feeds required to populate the datalake.
Knowledge & Skills:
Knowledge:
- Creating data feeds from on-premises to AWS Cloud (1 year)
- Support data feeds in production on break fix basis (1 year)
- Creating data marts using Talend or similar ETL development tool (2 years)
- Manipulating data using python and pyspark (1 year)
- Processing data using the Hadoop paradigm particularly using EMR, AWS’s distribution of Hadoop (1 year)
- Devop for Big Data and Business Intelligence including automated testing and deployment (1 year)
Skills:
- Talend (1 year)
- Python (1 year)
- Business Intelligence Data modelling (3 years)
- SQL (3 years)
Desirable:
- AWS: EMR, EC2, S3 (1 year)
- PySpark or Spark (1 year)
Position Purpose:
- Data Engineers build and support data pipelines and datamarts built off those pipelines. Both must be scalable, repeatable, and secure.
- The Data Engineer helps to facilitate gathering data from a variety of different sources, in the correct format, assuring that it conforms to data quality standards and assuring that downstream user can get to that data timeously.
- This role functions as a core member of an agile team.
- These professionals are responsible for the infrastructure that provides insights from raw data, handling and integrating diverse sources of data seamlessly.
- They enable solutions, by handling large volumes of data in batch and real-time by leveraging emerging technologies from both the big data and cloud spaces. Additional responsibilities include developing proof of concepts and implements complex big data solutions with a focus on collecting, parsing, managing, analyzing, and visualizing large datasets.
- They know how to apply technologies to solve the problems of working with large volumes of data in diverse formats to deliver innovative solutions.
- Data Engineering is a technical job that requires substantial expertise in a broad range of software development and programming fields.
- These professionals have a knowledge of data analysis, end user requirements and business requirements analysis to develop a clear understanding of the business need and to incorporate these needs into a technical solution.
- They have a solid understanding of physical database design and the systems development lifecycle.
- This role must work well in a team environment.
Qualifications:
- IT Degree/Diploma (3 years)
Desirable:
- AWS Certification at least to associate level
Experience:
- Business Intelligence (3-5 years)
- Extract Transform and Load (ETL) processes (3-5 years)
- Agile exposure, Kanban, or Scrum (2+ years)
Desirable:
- Retail Operations (3-5 years)
- Big Data (1+ years)
- Cloud AWS (1+ years)
Job objectives:
- Design and develop data feeds from an on-premises environment into a datalake environment in an AWS cloud environment
- Design and develop programmatic transformations of the solution, by correctly partitioning, formatting and validating the data quality
- Design and develop programmatic transformation, combinations, and calculations to populate complex DataMart’s based on feed from the datalike
- Provide operational support to DataMart data feeds and datamarts
- Design infrastructure required to develop and operate datalake data feeds
- Design infrastructure required to develop and operate datamarts, their user interfaces and the feeds required to populate the datalake.
Knowledge & Skills:
Knowledge:
- Creating data feeds from on-premises to AWS Cloud (1 year)
- Support data feeds in production on break fix basis (1 year)
- Creating data marts using Talend or similar ETL development tool (2 years)
- Manipulating data using python and pyspark (1 year)
- Processing data using the Hadoop paradigm particularly using EMR, AWS’s distribution of Hadoop (1 year)
- Devop for Big Data and Business Intelligence including automated testing and deployment (1 year)
Skills:
- Talend (1 year)
- Python (1 year)
- Business Intelligence Data modelling (3 years)
- SQL (3 years)
Desirable:
- AWS: EMR, EC2, S3 (1 year)
- PySpark or Spark (1 year)
Desired Skills:
- • Business Intelligence Data modelling
- • SQL
- • Python
- • Talend