- Building and maintaining Big Data Pipelines using Data Platforms.
- Expertise in data modelling Oracle SQL.
- Analysing large and complex data sets.
- Perform thorough testing and data validation to ensure the accuracy of data transformations.
- Working with Enterprise Collaboration tools such as Confluence, JIRA etc.
- Knowledge of data formats such as Parquet, AVRO, JSON, XML, CSV etc.
- Working with Data Quality Tools such as Great Expectations.
- Developing and working with REST API’s is a bonus.
Minimum Requirements:
Above average experience/understanding (in order of importance):
- Terraform
- Python 3x
- SQL – Oracle/PostgreSQL
- Py Spark
- Boto3
- ETL
- Docker
- Linux / Unix
- Big Data
- Powershell / Bash
- Cloud Data Hub (CDH)
- CDEC Blueprint
Basic experience/understanding of AWS Components (in order of importance):
- Glue
- CloudWatch
- SNS
- Athena
- S3
- Kinesis Streams (Kinesis, Kinesis Firehose)
- Lambda
- DynamoDB
- Step Function
- Param Store
- Secrets Manager
- Code Build/Pipeline
- CloudFormation
- Business Intelligence (BI) Experience
- Technical data modelling and schema design (“not drag and drop”)
- Kafka
- AWS EMR
- Redshift
- Basic experience in Networking and troubleshooting network issues.
- Knowledge of the Agile Working Model.
Desired Skills:
- Big Data pipelines
- AWS
- Terraform
- Python
- Data Engineer