Cleaning Your Data-And Keeping It Clean

October 01, 2003, 1:00 a.m. EDT 4 Min Read

A recent report by the Data Warehousing Institute claims that the annual cost of poor data quality for U.S. industries is $611 billion. This includes direct costs of analyzing and correcting data errors and indirect costs as well.For instance, when errors become exposed to customers and regulators, fines can follow and the backlash can force an avalanche of expensive changes to how an insurance company conducts its business.

These costs can undermine the breakthrough insight that a data warehouse has long promised: managers using customer data to develop targeted new products and gain market share; actuaries using data to more accurately price risk and evaluate loss reserves; and agents using data to grow and maintain customer relationships.

Today's environmental factors make this an even more pressing issue for insurance companies. New channels and increased competition force today's insurers to face a new set of challenges. They must better understand the effectiveness, profitability and interactions of distribution channels-and be able to act to optimize those channels.

Therefore, economically cleansing data-and keeping it clean-must be a high priority for insurers (see "The Industry's Dirty Secret," cover).

Data sources

Cleansing data begins with understanding the company's data sources. In the insurance industry, those sources are proliferating fast.

External sources include government organizations, credit and claims bureaus, broker channels, third-party consumer marketing information, demographic information, and online consumers. Internally, as insurers have recognized the business potential of making data accessible beyond the "data elite," sales and service personnel, captive agents, and back-office operations have all begun to gather and input data.

Unfortunately, many of the sources are rife with the potential for dirty data. For example, the Data Warehousing Institute research shows that employees make 76% of the data entry errors that help account for bad data.

An increasing number of carriers are beginning to consolidate data from their multiple legacy systems or data marts to a centralized data warehouse. The ultimate goal is creation of an enterprisewide view of their customers and their business.

As a data consolidation project begins, most companies use an ETL (Extract, Transform, and Load) process that examines the incoming data for errors and problems-and, hopefully, addresses the problems during the "transform" part of the process. Data cleansing achieved this way is notorious for cost and time overruns, but there has been considerable improvement in the process.

In fact, one leading West Coast insurer has taken advantage of advances in data warehousing technology, especially the idea of parallel architecture, to reverse the last two steps of ETL, turning it into an Extract, Load, and Transform (ELT) process.

In other words, the cleansing is done as the data is being loaded. In its data consolidation project, this insurer believes that the switch to ELT saved it three to six months, even as it reduced its dirty data from 20% to zero.

Because insurers use numerous data sources, they can't rely on the initial cleansing process. An ongoing data stewardship program aimed at ensuring a constant stream of clean and reliable data is essential.

An effective data stewardship program is characterized by cross-functional teams that include representatives from key business and technical areas, including marketing, underwriting, sales, claims, finance, legal and IT.

Such teams not only ensure that business people understand their role in maintaining the quality of the data, but also increase business-side awareness of what data exists and where they can find it, making it more likely they will maximize their use of the data.

Unfortunately, less than one-half of all companies nationwide have a formal data stewardship program in place. Even those that have a program may not be realizing all the benefits, because they are not necessarily aware of the best practices that have emerged.

Evaluation process

Those best practices begin with a formal evaluation process to help measure data quality costs and benefits, understand the data value chain, readily view where company data resides, and prioritize data quality efforts.

Evaluations include an examination of overall data quality, a gap analysis to identify organizational holes that lead to bad data, a process review of existing data management programs, an evaluation of communication plans, and identification of additional business users to increase the value and return on investments of data-related projects.

An evaluation of this type is the first step in creating an effective data stewardship program. Creating a visible and vigilant process to guard against the "garbage in, garbage out" syndrome will maximize the accessibility, reusability, and quality of a company's data. It is one key to achieving the kind of return on investment in data and technology that the insurance industry has long believed is there for the taking.

William Sinn is vice president, insurance and healthcare marketing, for Teradata, a division of NCR Corp.