Knowing what people will do before they do it is a useful skill. That’s why predictive analytics has the attention of the business intelligence (BI) world. Advances in statistical technology and computing power have made it possible to significantly improve predictive analytics, says Johan Jurd, MD of InfoBuild, an Information Builders representative in SA.
Predictive analytics is a data-driven extension that complements other BI capabilities, such as querying and reporting, OLAP and data visualisation. It provides a synergy of technologies that lets users analyse past and current performance to make predictions. The primary benefit is the ability to take appropriate actionable steps to address the future.
Combining BI with predictive analytics can produce new levels of insight that were not possible before, but successful implementation requires establishing a repeatable process.
The following are a few best practices for its implementation:
Dual projects
Predictive analytics and BI must be considered as two projects but not separate. Both share dependencies on each other.
There must be an objective to the modeling project. Not knowing what direction to go is a recipe for failure. This includes the incorrect modeling output, missing deadlines and veering outside of project scope and budget. The business case and goals must be identified.
Criteria for success and failure
Know the criteria that define success and failure.
Document the project’s vision. Lock in the scope and get buy-in from stakeholders. Buy-in is important for any project. With predictive analytics, this means committing stakeholders to aggressively applying results to decision making.
Select a methodology. This is a requirement for both BI and predictive analytics projects. Keep in mind the methodology differs from BI to predictive analytics projects (RUP/Agile/Waterfall for BI and CRISP-DM for predictive analytics). Make sure users know the data as well as relevant internal and external factors.
The CRISP-DM methodology (as shown on the diagram) is recommended for predictive analytics and is summarised in the following six steps. Keep in mind a majority of the time  is spent in the first three steps. This is an iterative process:
* Business understanding – the vision/requirements should be clear. There may be a need for further elaboration during the predictive analytics project;
* Data understanding – not knowing the data is a setup for failure;
* Data preparation – data specification, data cleaning and variable transformation are all a part of this step;
* Modeling – the model is created;
* Evaluation – outputs are validated on testing data; and
* Deployment – using the models in applications. Further predictive analytics may be needed for tweaking purposes.
Model Selection
Always select models first. Get assistance from experienced data mining practitioners. Knowing what models to use allows for the proper sourcing of data. For example, logistic regression technique requires a binary target and survival analysis requires two targets (time and status).
Model Validation
Be sure to validate models, keeping the following in mind:
* Use the predefined success and failures criteria to validate the model;
* When models are ready to deploy, consider this a deliverable of the predictive analytics project;
* Understand that each modeling technique may require data that is different from other techniques;
* Understand that each model output will differ from another model output;
* Understand that each model output can be evaluated differently from another model output;
* Understand that there are multiple approaches to solving the same problem;
* Benchmarking modeling output is difficult; it varies by industry, the data used, the values used, the techniques used and the business cases. Sometimes there are no benchmarks, since the business case and solution have not been done before;
* Don’t try to do too much with little. This means sometimes more than one model output is needed to handle multiple business cases. Try not to lump them into one model.
It is important to understand each step and keep in mind users are dealing with an iterative process. Models do not come out perfect on the first try. A perfect model should be questioned as well as the data that was used to create this “perfect” model. Understand that having too much or too little data can affect the model output accuracy. It is also not necessary to use all the data users have.
Avoid the following misconceptions about predictive analytics:
* Predictive analytics is new. It has been around since 1930 when Fisher and Durand created the first credit score model;
* Produces a perfect prediction. That depends on the data, and models are estimates;
* Push-button solutions. Tools cannot provide everything; mentors should select the technique based on business context;
* Build it and forget it. All models depend on the data that is provided. Data also has cutoff time periods. For this reason, models can get outdated. A refresh is required but varies by customers, industry and business case.
Success factors
Here are three factors that lead to successful BI and predictive analytics implementations:
* The availability of useful data that pertains to the business case. Sometimes there isn’t enough data;
* Knowledge of business domains and data; and
* Experience.
In summary, the keys to a successful BI and predictive analytics implementations are:
* Sufficient data to work with;
* Process and documentation;
* Understanding the techniques to be used; and
* Buy-in and commitment to apply the results.