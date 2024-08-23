Data Scientist (Remote) – Gauteng Johannesburg

A software engineering and data science consultancy based in Johannesburg is seeking a Data Scientist to join their South African team. They work with some of the largest enterprises in Africa, as well as a diverse range of SMEs and startups. The ideal candidate should be excited to collaborate with the company’s founding team, eager to carve their own path, and thrive in an environment that supports high growth and learning. A strong sense of urgency, a willingness to learn, and the ability to take initiative are essential qualities for this role.

Excellent communication, analytical skills, and decision-making ability in collaborative environments

Work with stakeholders throughout the organization to identify opportunities for leveraging company data to drive business solutions

Mine and analyse data from company databases to drive optimization and improvement of product development, marketing techniques, and business strategies.

Assess the effectiveness and accuracy of new data sources and data-gathering techniques.

Develop custom data models and algorithms to apply to data sets.

Use predictive modeling to increase and optimize customer experiences, revenue generation, ad targeting, and other business outcomes.

Must-have skills:

Experience using Python & SQL (optional) to manipulate data and draw insights from large data sets

Open source data science libraries and packages (Pandas, PySpark, Dask, Numpy, Scikit-learn, Tensorflow/PyTorch, Huggingface Transformers, OpenCV)

Experience creating and using advanced machine learning algorithms and statistics using traditional ML (regression, simulation, scenario analysis, time series forecasting, clustering, decision trees) and DL (neural networks) methods

Strong knowledge and experience in data cleaning, transformation and standardisation techniques (text mining, database record linkage, log data etc).

Strong understanding of version control and related concepts and techniques (e.g. Git)

Strong understanding of containerization technologies (e.g. Docker)

Nice-to-have skills:

Experience setting up and maintaining production Data and MLOps pipelines (training, inference, model monitoring etc).

Experience using AWS services: Redshift, S3, Lambda functions, Kinesis, Glue, SageMaker, etc.

Excellent experience with the open-source relational database management system, eg PostgreSQL, MySQL, MS SQL Server.

Bonus: Understanding of Python web app frameworks (FastAPI).

