Main Responsibilities:
- The performance of the Data Engineer can be described and measured by:
- Define a structured approach to problem solving and delivery against it.
- Create role specific design standards, patterns, and principles
- Assist the planning and management of the workload of the team and to ensure delivery
- Load large, complex data sets to and make data available for other data engineers
- Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing models for greater scalability
- Working with other data engineers and data modelers, you will design, implement, and manage data vaults, data transformations and the data pipeline
- Identify, design, and implement vault access layers to enable BI products to leverage the data within data vaults
- Monitor and fine-tune data vaults and data transformations on the Cloudera Hadoop stack
- Use modern development and modelling techniques and tools to implement BI and data management solutions, including data quality, metadata and reference data
- Engage with a wide range of technical stakeholders including data scientists, data analysts, business analysts, other data engineers and solutions architects
- Support data stewards to establish and enforce guidelines for data collection, quality improvements, integration, and processes
Role Requirements:
Qualifications:
- Bachelor’s degree in Computer Science, Statistics, Informatics, Information Systems, Engineering or another quantitative field / National Diploma in an Information Technology related discipline preferred
Work Experience:
- The Junior Data Engineer must have relevant experience in a similar environment working with the relevant tools and techniques
Technical Knowledge and Experience:
- The Junior Data Engineer is someone with a strong understanding of data, data structures and data sources. Required skills include:
- Application and data engineering background with a solid background in SQL is required
- Knowledge of database management system (DBMS) physical implementation, including tables, joins and SQL querying.
- Data architecture design and delivery experience preferred
- Experience in Database technologies (e.g., SAP Hana, Teradata or similar) or Hadoop components including HDFS, Hive, Spark, Oozie and Impala preferred and highly advantageous.
- Object-oriented/object functional scripting languages (e.g., Python, Java, Scala or related)
- Knowledge and experience of structured data, such as entities, classes, hierarchies, relationships, and metadata.
- Strong Data Engineering background with a specific focus on staging high quality data
- Understanding of data warehousing principles (e.g., Kimball and Vault).
- Experience in agile development
- Ability to comply to and manage data assets under a strict governance framework
Desirable/ preferred skills include:
- Data warehousing (Kimball and Data Vault patterns are preferred) and dimensional data modelling (e.g., OLAP and MDX experience)
- Experience in developing data pipelines using ETL tools (e.g., SAP Data Services), automation (e.g. Wherescape), scheduling and test automation (e.g. Robot) is desirable
- A solid background in SQL, Information Architecture and ETL procedures is required Experience with object-oriented/functional/scripting languages (e.g., Python, Unix Shell scripting, Java, Scala etc.) is preferred but not essential.
- Data Management technologies (e.g., Informatica Data Quality (IDQ), Informatica Enterprise Data Catalog (EDC), Axon, EBX)
- Event/Streaming based data pipelines (e.g., Kafka or Nifi) nice to have
Desired Skills:
- Data
- python
- scala
- SAP
Desired Work Experience:
- 1 to 2 years
Desired Qualification Level:
- Degree