Data Quality Measures in Business
Essay by nikky • July 22, 2011 • Essay • 1,984 Words (8 Pages) • 2,262 Views
Too Little, Too Late!
Data Quality Measures in Business
Introduction
In the 1990's companies began to implement business applications with the goal of improving the business processes. Examples of these are ERP, SCM and CRM
In 2002 Gartner came out with this study containing this statement:
"More than 50% of business intelligence and customer relationship management deployments would suffer limited acceptance, if not out right failure, due to lack of attention to data quality issues."1
Now in a press release this year they are predicting the same figure for 20072.
Though I was unable to find any figures to support this claim I still find this a worrying statement. In the information age its key for businesses to have accurate and up to date information to work with, yet currently businesses only focus on problems separately as they arrive rather than looking at the origins of the problem.
In this paper I intend to look at the problem of data quality (DQ) and ways that it is being solved.
What is Data Quality?
Quality is a hard thing to define, but can be defined roughly as its ability to perform its given task. According to Larry English data quality involves "consistently meeting knowledge worker and end-customer expectations."3. Many different sources cite similar attributes that quality data must adhere to.
The Data Warehouse Institute defined it as:
Accurate - The data accurately represents reality.
Integrity - The structure of data is consistent.
Complete - All necessary data is present.
Consistent - Elements are constantly defined.
Validity - Data values are within expectable ranges.
Timely - Data is available when needed.
Accessibility4 - Data is easily accessed, understood and used.
A similar list was also published PriceWaterhouseCoopers & DataFlux5 in the same context. Although they didn't elaborate on these points it can be reasonable assumed that they meant the same thing.
Looking at the list we can see that quality of data is not just about its accuracy as may be initially thought. The first 5 items in the list are concerned with things that we can commonly say is true of erroneous data and are problems we might concern with things like: misapplied business rules, duplicate records, and missing or incorrect data values.
The next two in the list show us that the problem is a lot deeper, even perfect data quality from a content and structure point of view. Perfect data is no good if knowledge workers cannot access or understand it.
Data missing just one of these attributes may cause problems and is likely to significantly lower its value.
Consequences of Bad Data Quality
The consequences of bad data quality in the short term are money and other resources.
"The direct costs have been measured at anything from 10 to 20 percent of an organisations operating revenue (or budget in not for profit entities), moreover, organisations can waste 40 to 50 percent of their IT budget on "information scrap and rework", in the form of re-runs, fixing requirements, design and coding defects, along with unnecessary redundant data handling and interfacing, and data correction."6
Another consideration is the opportunity costs. These costs can't be measured and I wasn't even able to find an estimated value however they could according Larry English higher than the direct costs.
"Poor quality information, however, creates opportunity losses that can be even greater than the direct costs. When you alienate customers by misspelling their name, mailing information to the wrong address or sending customers the wrong items, you risk the loss of the customer's lifetime value. Poor quality information causes business intelligence and trend analysis to fail, causing you to sub-optimize marketing campaigns, product development and other opportunity development processes. This prevents you from exploiting opportunities."7
Sources of Bad Data
Many problems relating to data quality have arisen due to many companies having acclimated large amounts of data over a decade or more. This data was commonly collected in a number of different systems. The information spread and the way it was collected initially caused problems such as:
The Large data silos being disjointed network of data over the separate systems they spanned. Data was not consistently defined or properly connected and most probably filled with inaccurate, missing, duplicate or extraneous.
Unmanaged growth of systems could also cause them to be haphazard, without a predetermined plan for assuring that information was consistent, accurate and reliable
These problems may well have not been realised in the early stages, leading the problem to escalate unchecked as systems are adapted for uses they are not originally intended for. From my own personal experience I have never found organisations to maintain databases in 3rd normal form, I have worked fairly extensively with a database project and when trying to keep it to the 3NF standard I have been encouraged to take the easy route and introduce duplicate data and use work abounds such as entering '' instead of a string field where it cannot be null.
As organisations tried to integrate different systems across boundaries, they found their integrated processes and systems failed.
This is a list of bad data sources from Dal Cais Research:
On entry - Data is entered incorrectly
In context - Different words can have different meanings.
In legacy systems with poor data -Often, the data validation processes in place when the system was set up fall short of those required today.
Tracking data across systems -Building a singular view of the problem might result in
...
...