Data Scientist required for a private health care provider in Johannesburg.
Duties include but not limited to
- Assist with research on trends in Data Science, specifically for the application in the healthcare industry.
- Collaborate with business to identify requirements as well as opportunities for improving business processes and creating value to business.
- Partner with business stakeholders to define approaches to resolving key business problems and focus on the development of new business strategies.
- Identify expected outcomes of modelling.
- Assist in developing conceptual designs or models to address business requirements.
- Collaborate with subject matter experts to select the relevant sources of data and information.
- Identify available and relevant data leveraging collection processes and identify new data collection processes such as social media.
- Partner with the Data Engineering team, where required, to manage data ingestion.
- Recommend third party sources of information to extend company’s data when required.
- Perform pre-processing of data which includes, but is not limited to:
Data manipulation
Transformation
Normalisation
Standardisation
Visualisation
- Derivation of new variables/features as applicable to developing specific algorithms or models.
- Execute code reviews on existing assets and highlight relevant performance or data related issues.
- Use data profiling and visualisation to understand and explain data characteristics that will inform modelling approaches.
- Perform feature engineering as applicable for building algorithms and models using machine learning techniques.
- Identify, create and implement the appropriate algorithm to discover patterns.
- Identify and implement the appropriate data mining/statistics/machine learning techniques.
- Enhance, find patterns in and build models on large data sets using distributed data processing and analysis methodologies
- Apply data mining techniques and perform statistical analysis on large data sets.
- Develop experimental design approaches to validate findings or test hypotheses.
- Analyse, interpret and explain results using appropriate statistical tools and techniques which can translate findings into clear, actionable and timely insights.
- Validate analysis using appropriate techniques (applying test data sets, A/B testing, scenario modelling, etc.).
- Productionising models using standard processes and techniques.
- Monitor the predicted outcomes of models.
- Understand business requirements to ensure that models are delivered in an appropriate format.
Business Support
- The ability to build, analyse and interpret numerical and non-numerical data to determine potential statistical inferences to inform business and clinical decisions.
- Ability in applying statistical machine learning techniques to predictive modelling.
- Ability to clean and unify messy and complex data sets for easy access and analysis. Combining structured and unstructured data.
- Ability to provide detailed explanations (visually and verbally), representing information in the form of a chart, diagram, picture, using tools such as Kibana, Tableau, Power BI, etc.
- Write programming code based on a prepared design.
- Understand leading edge technologies and best practice around Big Data, platforms and distributed data processing i.e. Hadoop ecosystem (distributed computational power)-HDFS/Spark/Kafka.
- Ability to conceptualise and frame a problem, develop hypothesis and identify objective measures to estimate accuracy of machine learning/statistical processes and perform testing and validation with careful experiments.
- Understanding of data flows, ETL and processing of structured and unstructured data within the data architecture.
- Comprehensive solution design based on a good understanding of the Big Data Architecture.
Minimum requirements
- NQF Level 7 – Bachelor’s degree or Advanced Diploma in the area of statistics, computer science, engineering or mathematics.
- Relevant data science certifications such as Python, Microsoft ML, AWS, Hadoop, big data, machine learning, cloud infrastructure.
- Certification in SQL and working with large-scale data sets.
- Project Management qualification or agile certification such as Scrum or Prince2.
- An advanced level of Computer Literacy and proficiency in MS Office applications.
- A minimum of 4 years’ experience in data science related initiatives or projects.
- Experience with SQL and working with large-scale data sets.
- Practical experience applying machine learning techniques.
- Experience working in agile development teams.
- Experience in operationalising data science solutions or similar product development.
- Experience in a high-scale production environment is critical.
- Experience with Python/Microsoft ML and tools available within the machine learning ecosystem.
- Proven track record in business process analysis, systems and data analysis.
- Solution focused and strong collaborative mind-set.
- Demonstrates excellent organisational skills: organised and structured.
- Outstanding problem solving and analytical skills.
- Knowledge and understanding of the data science process including but not limited to:
Data profiling
Feature selection
Data modelling
Model evaluation
Production and implementation
Monitoring - Knowledge of trends and developments in the health care industry.
- Business and clinical knowledge that will contribute to exposing patterns.
Desired Skills:
- data science
- scientist
Desired Work Experience:
- 2 to 5 years