Synthesizing Data for A New View

January 02, 2011, 10:08 a.m. EST 8 Min Read

As the insurance industry completes the transition fromthe age of data scarcity to the age of data abundance, the question now becomes how best to optimize this surfeit of information.

One way to capitalize on this digital bounty is to integrate existing data streams with the intention of creating something more useful, such as for business analytics purposes. These hybrid data points are often grouped under the term "synthetic data" (one succinct connotation of synthetic data is simply the creation of data that doesn't currently exist). Crossing internal and external information is one common way to produce synthetic data. Also, information captured during one internal process (i.e. underwriting) may be synthesized with data culled from elsewhere in the enterprise (i.e. claims) to produce heretofore unknown insights. Likewise, an insurer could blend data from a variety of third-party sources to gain new knowledge.

To be sure, given the menagerie of data types carriers possess (producer, survey, customer, third-party, claims, structured and unstructured, etc.) process is key when crafting synthetic data. While a healthy measure of experimentation is a necessity to achieve novel data elements, the goal is more to emulate the meticulous, methodical work of Gregor Mendel than that of the more capricious improvisations of a Victor Frankenstein.

One fertile area within the insurance enterprise where synthetic data is making inroads is in the underwriting process. By combining existing data in new ways, carriers can craft synthetic data points that are more granular and thus more useful in achieving a finer segmentation of risk. For example, an underwriter could pair address data culled from a policy administration system with geospatial data from a third-party vendor that reveals the position of fire hydrants in order to get a better sense of how to price fire risk at a given location. "To get the most granular price, you need the most granular data," explains Nancy Hoppe, SVP and chief pricing actuary, Zurich North America Commercial.

Another area where synthetic data exhibits great potential is fraud. A synthetic data point that compares an insured's home address to their business address may be helpful in detecting rate evasion. Likewise, a synthetic data point that merges odometer data from a vehicle claim with underwriting data might help identify drivers who are purposely misstating miles driven per year in order to qualify for a low mileage insurance discount.

Hoppe notes that several of the conditions enabling this confluence of different data streams have unfolded only recently. One is the ongoing advances in the methodologies used to analyze data. Another catalyst is the continuing fall in the price of data storage. With hard drive density growing exponentially, and price per terabyte falling in response, it is now commonplace for carriers to have petabytes of data available for analysis. Previously, with storage space at a premium, carriers no doubt excised a good deal of potentially useful information in order to fit it into archaic, technologically defined data structures. "It used to be expensive to store data," Hoppe says.

CHALLENGES

Yet it is partially this profusion of cheap storage that engendered one of the primary challenges to wider use of synthetic data by insurers-the sheer volume of data they store. Worse still, as carriers gorged on storage in recent years, many architectural concerns were pushed to the side, notes Oleg Sadykhov, a principal at Farmington Hills, Mich.-based technology consultancy X by 2. Another challenge insurers face is data incompatibility and fragmentation. Much of this problem derives from the multiplicity of policy administration systems that many carriers employ (each of which may use different data definitions). This is especially so for insurers that have grown through mergers or acquisitions.

Even with larger architectural concerns settled, there is still work to be done at the application level. Legacy systems and their attendant databases can present a challenge as data contained within may have to be modified or remapped to make it palatable for a synthetic initiative. Carriers can go a long way to solving the issues of abundance and fragmentation by undergoing a thorough architectural review that baselines what the current issues are and expressly defines a strategy for master data management (MDM). "You need to document the fine details of how data is used in an organization," Sadykhov says.

Mike Mahoney, senior director of product marketing, auto casualty solutions, at San Diego-based Mitchell International, agrees that accurate and up-to-date internal data mapping is key. "How you store data may almost be as important as how you combine it," he says. "MDM is a big deal."

Thus, answers to questions regarding the system of record, how to keep databases in sync and, perhaps most important, who is allowed to update and edit such systems, need to be carefully considered before a carrier can achieve optimal results integrating data. Mahoney contends that carriers stress the concept of concurrency control to ensure that correct results for concurrent and ongoing operations are generated. Many current database management systems employ algorithms to assist with such control and ensure the database is updated in a consistent manner.

Another technology that helps insurers refine data for use is de-duplication, which employs sophisticated algorithms to weed out redundancies in data and compress it to a fraction of its former size. Major storage vendors IBM, EMC and NetApp offer de-duplication within their product lines, but take different approaches. There also are third-party vendors that will de-duplicate data for carriers as a service.

THE DATA ITSELF

By its very nature, insurance data is not static. Mahoney contends that any meaningful use of synthetic data is going to be heavily reliant upon internal carrier data as a starting point, noting that an estimated 2.5% of the consumer demographics information stored in corporate databases changes each month. That means a complete turnover of information is possible every four years. "Before we move into synthetic data points, let's make sure we have our arms around our internal aspects," he says. "If your internal data is flawed, you are building on sand."

An indication on the breadth of concern surrounding data quality was evident in a study released by New York-based Novarica in November 2010. The study queried 75 insurance CIOs who are members of Novarica's Insurance Technology Research Council about the challenges they face leveraging business intelligence. The results indicated that the primary issues all surround data quality, with 50% of respondents citing significant challenges with data inconsistency (data is in different forms) and more than 60% indicating significant challenges concerning source system data quality (the information is inaccurate or invalid).

The remedy to many data quality woes may be a stringent data validation and scoring process that mandates a set of rules to check on data completeness.

Data benchmarking is another tool for insurers devising a synthetic data strategy, says George Davis, VP at Boston-based AIR Worldwide. Benchmarking can highlight subtleties or discrepancies in data when analyzing a company's data in comparison to industry distributions. Davis recalls an experience when he was reviewing benchmarking data with an insurer client; they realized that many of the homes covered in a certain book of business were, in fact, mobile homes. The insurer was thus forced to concede that it had either failed to follow its underwriting strategy or had serious shortcomings in the quality of its location data.

Indeed, with third-party databases and producers providing so much of an insurer's data stream, a holistic strategy that focuses all members of the insurance value chain on the importance of data quality is vital. Davis notes that major insurers are providing feedback to producers that scores the quality of their data submissions as an incentive to provide better data. "Metrics like this are vital to stimulating improvement," he says, adding that the importance of data quality to synthetic data initiatives is hard to overstate. "If you want data-derived insights, you had better have some good data going into it."

Mahoney agrees that data quality is paramount, and adds that with carriers already awash in data, modesty may be the best policy. "Don't look for too many synthetic data points," he says. "Focus on the goal, not on creating more data."

Zurich's Hoppe concurs, stressing that one should not get carried away with the use of synthetic data, that insurers should not forget to leverage something that cannot be synthesized - human insight. "While we use technology to assist in risk segmentation, we still put the most stock in the ability of our underwriters," she says. "Technology is great but it is only a starting point." INN

De-risking Data Conversion

The information technology equivalent of drudgework, data conversion is nonetheless an increasingly important skill as insurers move from legacy to modern systems.

A new report from Boston-based Celent says insurers need to account for data conversion in their strategic planning. "Innovation in business processes is a key area for many insurance companies, and data conversion is almost always part of the transition," says Mike Fitzgerald, senior analyst with Celent's Insurance Group, who authored the report with Analyst Craig Beattie.

The authors say one common problem is that carriers fail to recognize and source the specific project management skills required for successful data conversion. "A repeated error identified by both vendors and insurers is applying an application development project management approach to a data conversion effort," the report states. "In reality, the skill sets are different and require separate abilities."

Another misstep the authors cite is the failure by project leads to familiarize themselves with the data they intend to convert. "Frequently, the complexity and cost of a data migration project relates to the effort required to mine and document the fine details of how data is used in an organization," the report states. "This is also an area in which the work effort is traditionally underestimated."

In addition to having the requisite human capital in place, there also are many technology options that can smooth the conversion process. "The good news for insurers is that many vendors and service providers have experience and tooling for data conversion projects," the report states. "In many cases the tools can be rented for the time they're required, and training is offered for internal staff."