We value a data engineer as someone who works behind the scenes to obtain, process and supply data via various methodologies and technologies, to various consumers, in ways and forms that make sense and add value. This definition is very broad, as the field of data engineering is just as broad.
You may be the type of data engineer that develops API endpoints for the consumption of data by end users or even another data pipeline, or you may be the type of data engineer that develops highly distributed, high availability data processing pipelines in an effort to satisfy the need of the ever questioning data analysts and/or data scientists.
For this role, we’re looking for experienced Big Data Engineers with experience in building data warehouses and / or transactional data models within the Big Data environment.
That means that Spark / PySpark experience, as well as experience with the Hadoop EcoSystem, which leads on to any Python, Scala or Java coding experience. We need someone who can build data pipelines (ELT / ETL) within the Big Data environment.
Desired Skills:
- In depth architectural knowledge of Spark and Hadoop Expert in building ETL pipelines using Spark (Pyspark) Experience using Spark with HDFS
- Experienced writing data pipelines using functional programming (Python
- Java
- Scala) Advanced ANSI SQL experience Firm understanding of Big Data and traditional data processing
- understanding the differences in depth to make informed design decisions Firm understanding of data Modelling OLAP vs. OLTP vs. Hybrid models Firm understanding of dimensional modelling i.e. Kimball
Desired Qualification Level:
- Degree