Readying for the Data Deluge

The act of aggregating data for the sake of making better decisions is as old as the insurance industry itself. Another constant is that the tools used to gather and analyze this data have continually evolved.

Now, insurance and other industries that rely heavily on data to augment and in some cases supplant human decision making are entering an era in which many feel evolutionary changes are about to give way to revolutionary ones. Dubbed Big Data, the term refers loosely to series of storage and algorithmic technologies that enable analysis of data sets too large for traditional tools. In terms of data volume, this means the largest insurers may soon be moving out of the realm of gigabytes and terabytes, into the terra incognita of petabytes and beyond.

Yet, the size of the data sets is not Big Data's sole story. Many would argue that for the insurance industry the novel nature of the data streams comprising the data sets is of greater import. A good part of this growth is in formerly unstructured data such as audio and video recordings. For example, insurers can now readily avail themselves of new video and audio search capabilities or face recognition technologies to aid in fraud detection. Another copious new source of new data for insurers is usage information derived from networked sensors placed on everything from automobiles to mobile devices to buildings.

Whether the information is culled from this nascent Internet of Things, or gleaned from more traditional sources such as social networks or satellites, its proper use will present a challenge for insurers, says Phillippe Torres, CEO of technology consultancy In Edge. "When you start factoring in different data sources that insurers are not used to working with, you get a whole new set of problems," Torres says. "Also, going to 1,000 times more data has big implications for storage and infrastructure."

Donald Light, senior analyst at Celent, agrees that insurers may initially struggle to squeeze all the business value out of these new information sources. "This data may come in forms and formats the insurance industry is not used to working with but it has great intrinsic value," he says.

Anand Rao, a principal with PWC, says this new glut of data will ultimately change how insurance companies operate. "In terms of the amount and types of data available, [Big Data] will be revolutionary," he says. "The question for the insurance industry will be how to convert all this information into something of value."

Few people likely realize the challenges and potential Big Data presents to insurers than Swati Abbot, president of Blue Health Intelligence. Abbot came to her current position from health care analytics provider MEDai, eager to work with the gargantuan set of claims data BHI has on hand from the 54 million annual lives in the Blue Cross network nationwide.

In addition to its vast population size, the data BHI has to work with is deep in a longitudinal sense, going back more than 5 years. "We have one the best data assets in the country today," Abbot says. "It's clean data, it goes back several years and it's representative of the whole country. I feel very privileged to lead this company." Abbot says this depth is important because it is safer making predictions when a pattern is observed over multiple years. "One year of data is not enough to tell you if it is a repeatable answer you are getting," she says.

As for the breadth of the data set, it is vital to identifying rare diseases that are not very prevalent but can be very costly to treat if not diagnosed early. "A smaller data set, with 500,000 people enrolled in a single health plan, is not large enough to contain a significant number of people with low volume disease," she says, adding that as data sets get larger the algorithms used in models tend to get smarter and better defined. "But when you have the amount of data we do there are no rare diseases. We can always find enough of a population to make a prediction."

One other intrinsic benefit to BHI's claims data is its geographic spread, which enables Abbot and her team to model different practicing patterns used across the nation. "We will often create a generic model based on all our data and all our geographies, but then we create custom iterations based on geography," she says.

Moving forward, Abbot says her primary challenge is to leverage this data set in order to create analytics that help physicians identify risk drivers and make better decisions. This entails using data mining and pattern recognition to identify connections likely to evade normal cognitive human thinking. For example, using the data, Abbot's team was able to divine that mental health issues correlate to higher risk for complications following knee replacement surgery. Armed with this knowledge, physicians can now order more screening and follow up after surgery for patients with these symptom combinations.

"We can mine the data just to see where the trends are," she says. "It doesn't have to be a predictive model. We can focus on the areas that are draining our system and leading to bad outcomes. My mission is to impact health care."

Can more be less?

While the advent of Big Data is of obvious value to health insurers and other lines such personal lines auto where mammoth data sets are often a matter of public record, the benefits of the technology may be tougher to extract for specialty insurers and other low-volume, high-value lines of insurance. "Based on our mix of business, the big thing for us is expanding from traditional insurance lines to more specialized lines," says Lisa Diers, head of technical price and predictive modeling operations for Zurich North America. "Everyone understands auto, but where do I go next and where do I have enough data at a granular enough level?"

What's more, Diers says, in the short run adding more data and metrics to existing models often means trouble by adding another layer of complexity. Diers notes that the trouble collecting certain data points can often negate their usefulness such as when they bump up against regulatory gray areas. "At what point does adding another data point become not worth it?"

To counter this, she counsels a regular, restructured reexamination of what's going into predictive models. "As it has become easier to get your hands on more data, the question that remains is "So What?" she says. "How do I maintain this and keep my analysis fresh if I am looking at hundreds and hundreds of data points."

Indeed, one of paradoxes of Big Data is that as it becomes more prevalent, it creates the need for data: Analytics begets analytics. "Right now a modeler may have to make sense of 100 metrics," Torres says. "That's a sandbox that you fit into your headspace. If you have 1,000s of metrics, you may have to run analytics just to determine which are best."

Thus, Torres foresees a need for "meta-analytics" or analytics about analytics. In order to validate their efficacy of predictive models, Celent's Light recommends a tight interrelation between the "data scientists" creating the models and the operational-level business teams using them on a daily basis. "For maximum business value, the work of quants has to have a feedback loop into operations," he says. "To make sure predictive analysis works, it must be reviewed by business intelligence applications."

Abbot employs a similar hybrid tactic at BHI. "I like to have a split team with analysts who can analyze the data and trends paired with a team of mathematicians and scientists who can really go at the data. They work hand in hand, as one is solving real-life, day-to-day problems and one is creating solutions for tomorrow."

Nonetheless, as modeling becomes more pervasive, there may well be limits to the amount of human intervention that can be afforded. "Even big companies may not have the bandwidth to redo models every three weeks," says Rob Walker, VP decision management and analytics at Pegasystems.

To avoid this constant tweaking of models, Walker says insurers may have to turn to self-learning models to cut down the workload. "A lot of companies are employing self-learning models, where the models themselves actually learn from every good and bad prediction they make," he says. "The algorithms have really come of age."

Another mitigating factor for Big Data is the speed in which data can be absorbed and transmitted to the end user. Here, advances in database architecture and hardware utilization are paying dividends. This year, SAP unveiled its High-Performance Analytic Appliance (SAP HANA) software. The technology makes use of mass parallelization by taking advantage of each core in the multi-core processors now powering servers and then scaling linearly across multiple blades.

Moreover, HANA enables modelers the ability to store massive amounts of data in DRAM for rapid retrieval. According to SAP, HANA-based systems have demonstrated the ability to perform arbitrarily complex queries on more than 450 billion records in a matter of seconds and scale to 1,000 CPU cores and beyond.

"By lessening the possibility of corruption and easing storage issues, in-memory enables you to bring analytics much closer to business processes," says Pat Saporito, senior director of the Global Center of Excellence for Business Intelligence at SAP.

With most of the technical issues addressed, Saporito foresees several factors, such as the impact of consumerism, putting a further premium on analytics in real time and pushing the technology to an operational level. To be sure, everybody in the insurance enterprise-from actuaries to claims adjusters to auditors and agents-could make good use of this real-time capability.

Indeed, it is the operational potential of Big Data that is most appealing to Light. "What's exciting is that just as we're getting a mastery of traditional data sources, the heavens are opening," he says. "It's going to be a big challenge."

 

Open-source Software Increasing Options

 

 

One necessary precursor to insurers entering the Big Data era has been a blossoming of technology to support it.

 

 

On the architectural end, advances in relational database design such as SAP's High-Performance Analytic Appliance enable insurers to store and manage vast amounts of data. A less traditional architectural option is the emerging ecosystem surrounding the open-source framework Hadoop. Originally built as infrastructure for search engine indexing, Hadoop enables distributed processing of large data sets across clusters of computers using a simple programming model.

 

 

Open-source tools are also cropping up at the application level. In particular tools based on the statistical computing language R are now coming to market. Created 15 years ago by two professors of bioinformatics in New Zealand, R is well established in the academic and research communities for all types of statistical analysis.

 

 

Jeff Erhardt, COO of Revolution Analytics, which was founded in 2007 to support the R user community and develop R-based tools aimed at commercial users, says the company has worked on improving the scalability, ease of use and interoperability of R to make it more palatable to corporate environments. "R was designed by statisticians for statisticians, which is both a good and bad thing," he says. "Now, we want to take this tool dominates academics and research and drive it into the enterprise."

 

 

One advantage of R-based software, Erhardt says, is because R is widely used in academia and students entering the workforce are already familiar with it. Thus, it may help companies recruit workers skilled at advanced statistical analysis.

 

 

Moreover, the 2 million-member R user community maintains a library of 3,000 packages or applets designed to solve certain statistical problems.

 

 

These packages often work well across industries. For example, a package designed for social network analysis could work well for fraud detection."There's cross-pollination from both a talent and algorithm standpoint," Erhardt says. "It may enable insurers to hire people from different fields."

For reprint and licensing requests for this article, click here.
Data and information management Policy adminstration Analytics
MORE FROM DIGITAL INSURANCE