As awareness grows about the importance of having high-quality data to avoid
organisational inefficiencies, delays and downright disasters, more and more
companies are tempted to go the DIY route.
In truth, DIY data quality is a little like DIY electrical installation: in
theory it could save time and money, but in practice, and unless you have
all the skills you need, it could cost more in the long run – and be
dangerous to boot.
One of the key issues is that, terminology aside, data quality is more a
business problem than a technical one. Data for its own sake is pointless –
it's only the users of that data who give it value.
IT departments on their own cannot make decisions about data quality
(content), data models (structure) and data semantics (meaning) – it takes a
high level of involvement from the rest of the business.
There are proven methodologies that can help secure the necessary support
from the business so that everyone works together toward common goals – but
even these take some skill to implement. Often there is no substitute for
having an independent third party involved.
IT departments without specialist skill in data quality management may also
lack knowledge of the software tools they need to do the job. In the worst
cases we've seen people resort to manual coding, which is a painful
re-invention of the wheel as well as an invitation to failure.
It's no easy task to write code for tasks like data profiling, matching,
merging, and textual parsing – this is complex software development, with
all the attendant messy and ongoing maintenance and skills retention
problems. It's far simpler to use one of the many excellent off-the-shelf
data quality tools that have all the necessary functions built in, such as
statistically based algorithms for fuzzy matching and de-duplication. These
can deliver trusted results without the pain.
There is a caveat, however – organisations should not imagine that going out
to buy a new software tool will solve their problems, in the same way buying
a volt meter won't turn your average householder into an electrician.
Choosing the right tools and knowing how to use them properly takes skill
In theory these skills could be developed in-house, but here we run up
against a third problem: what IT department isn't already over-stretched
just keeping up with daily operations and existing projects? Data cleanups
are usually seen as unexciting, hard slog projects with ongoing scrap and
Once again, it will probably be cheaper in the long run to find a group of
experts who've done it before and can dedicate full-time effort to the task
until it's finished.
There are some less predictable obstacles in the way of DIY data quality as
well. For starters, measuring data quality is easier said than done. Which
of the multitude of commonly accepted data quality metrics (completeness,
consistency, timeliness, accuracy, integrity, etc) will you choose? And once
you've chosen, how will you measure them? Can you measure your data quality
at the level of attributes, record, entities, databases and business areas?
How do you assess data alignment across multiple systems? Entire books have
been written on the subject, which is a specialist area few IT professionals
Even worse, once you know what you're going to measure and what your goals
are for your data, how will you make it happen?
Getting data quality right is very far from being a one-off project – it
needs an ongoing governance programme that outlines what the organisation's
data strategy and policies are, what tasks need to be performed and by whom,
who will lead the strategy and more.
In the absence of this ongoing programme, the week after a data quality
exercise ends everyone goes back to the old way of doing things, without
thought for those who need to use what they're creating later on. Defining
and setting up this data governance programme is a political minefield.
In the end, data quality problems are rooted in business processes and
people issues over which IT departments have little or no control, even if
they can identify these root causes. Cleansing the data you have is no good
if there's a constant flow of new, dirty data coming in.
This means that better data quality requires not just introducing new or
improved business processes, but also training everyone in the organisation
so that they understand the true role, value and importance of the
information they're working with.
Once again, these are tasks way beyond the scope of even the most
well-resourced IT departments; in the same way it's sensible to call in an
electrician if you're planning to rewire your house, it's sensible to call
in data quality experts if you're planning to overhaul your data management.
Outside experts will not only have the specialist knowledge you need,
they'll also be more easily able to negotiate the political battles involved
in getting data right.
In short: if your organisation has a problem with data quality (and most
do), fix it now. Don't first attempt to document what you already have, or
wait to confirm your business rules – that will just compound the problem
and delay the solution. By the time you finish they will probably have
Get objective advice and implement a structured, disciplined data quality
programme from the ground up. Your bottom line will thank you for it.