CNA Using Big Data to Fight Insurance Fraud

Published January 09, 2015, 12:00 a.m. EST

Updated January 08, 2015, 12:02 p.m. EST

7 Min Read

Fraud is a major problem for insurers and the general public alike. The National Insurance Crime Bureau contends that 10 percent or more of P&C claims are fraudulent. Percentages are even higher in worker’s compensation. And the FBI says non-health insurance fraud amounts to $40 billion annually.

Processing Content

Adding in health insurance fraud, the broader impact of fraud on consumer prices and the diversion of public resources to fight fraud, the Coalition Against Insurance Fraud pegs the total cost at $80 billion.

That’s why insurers are hoping that big data and associated analytic technologies will help them more effectively combat fraud. And the successes of early adopters such as CNA Insurance—which is already using a combination of open source big data tools and proprietary analytic appliances to avoid millions of dollars in losses that would likely have otherwise escaped detection—are fueling such hopes.

Tim Wolfe, AVP of Special Investigations at CNA, points out that there are several reasons carriers need to aggressively apply technology to the fraud challenge—rather than continuing to overly depend on human adjusters alone.

For one, not all adjusters are created equal. “Some people are just better at recognizing fraud than others,” he says. “In fact, most of the referrals to our investigations unit come from a relatively small number of adjusters.” Training can help—but even that, Wolfe notes, usually just results in a temporary spike in referrals. Plus, an increase in referrals doesn’t help if too many of those referrals are actually legitimate claims.

Adjusters are also under pressure to close claims quickly, and they often have heavy workloads that limit their ability to deeply scrutinize claims for possible “red flags.”

And then there’s the relentlessly increasing sophistication of those who commit insurance fraud. “Fraudsters learn our business and often operate in networks to avoid detection and achieve scale,” he explains. “So insurance companies like CNA clearly have to adapt if we are going to defend ourselves and the public from their schemes.”

CNA’s interest in analytics-driven fraud prevention can be traced back to 2006, when George Fay joined the company as EVP of Worldwide P&C Claims. In addition to being a 30-year insurance industry veteran, Fay had an outstanding military career that included acting as chief investigator of the abuses at Baghdad's infamous Abu Ghraib prison. During that investigation, the military employed sophisticated analytics for unstructured data in order to sift through massive volumes of documentary evidence. This helped strengthen Fay’s confidence in the power of such technology. So when he came to CNA, he charged his team with looking into the possibility of using similar technology to improve discovery of illegitimate claims.

From 2007 to 2011, Wolfe worked with IT to assess available tools and techniques across three main focus areas:

Rules-based analytics that would allow CNA to describe known indicators of fraudulent activity in terms of business logic, which could then be applied to active claims to flag suspicious activity for further investigation.
Ad hoc analytics that would allow CNA to detect anomalous activity or patterns in its active claims that might warrant further investigation, regardless of whether that particular type of anomaly was already associated with a known type of fraud.
Predictive analytics that would allow CNA to detect trends in historical data that might indicate the possibility of networked relationships between claimants—even if no claims in that network were individually flagged based on business rules or ad hoc anomaly detection.

The efforts of Wolfe and his team started to morph in 2010 from conventional business intelligence to big data as they began using technology from SAS and BAE Systems to examine more data from more sources—and as the analytics they ran against that data became more sophisticated and exploratory.
In addition to analyzing data from its own claims and payment systems, for example, CNA integrated information available from third parties such as LexisNexis, Verisk, FICO and the National Insurance Crime Bureau. Data from social media was also incorporated into the analytic environment in order to help detect possible networks of fraud scheme participants. Even the unstructured data from adjusters’ notes eventually got added to the mix.

The Bigger Picture

While Wolfe and his team focus on big data as it relates specifically to fraud, Chief Technology Officer Alok Mehta has to ensure that all of CNA’s business functions—including underwriting, pricing and marketing—are provided with resource-efficient access to the analytic capabilities they need.

This tension between departmental and enterprise imperatives is non-trivial for insurance companies seeking to optimize big data ROI. After all, left to their own devices, individual business functions can easily build multiple big data environments that inefficiently duplicate capabilities found elsewhere in the organization—while also being so tailored to their very specific needs that they can’t be shared.

At the enterprise level, on the other hand, the goal is to create a common big data analytics environment that allows resources and best practices to be efficiently leveraged across all business functions.

Mehta believes he is resolving this tension by crafting an enterprise architecture for big data at CNA that he refers to as its “information hub.” This architecture essentially segments the company’s big data operations into four components:

Source systems that include both internal and third-party sources of various structured and unstructured data feeds that can be accessed via the hub.
A “commodity landing zone” where all data arrives after basic ETL/ELT processes have been performed on it to ensure data quality and consistency.
A “transformational zone” where specific datasets can be appropriately integrated and prepped for analytic processing.
Analytic staging that forks big data processing down one of two paths. One is the type of guided analytics and reporting historically associated with traditional data warehouses. The other path is the more heuristic/discovery-oriented analytics required by business functions that are either in a discovery phase or dealing with highly dynamic operational issues that are not readily subject to rules-based analytic logic.

This architecture provides CNA with the cost, scale, manageability and time-to-results advantages that come with a shared enterprise environment—while also enabling CNA to address the very different analytic needs of multiple business groups within the company.
It also allows business groups that started their big data journey with much-needed heuristic experimentation to then operationalize their analytic discoveries by migrating them to a more suitably efficient and streamlined computing environment.

CNA is using open source software such as Hadoop running on commodity infrastructure to stage data in the information hub. It is also using open source tools such as Pig and R to program analytic queries and construct MapReduce datasets.

However, while Mehta is glad to use these technologies where they are appropriate, he strongly advocates the use of specialized appliances such as those offered by IBM, Oracle and EMC for each distinct set of analytic deliverables. These appliances incorporate compute, storage, networking, middleware, applications and management components that are all pre-configured and pre-integrated by the vendor based on reference architectures created for specific use cases.

“Appliances are very mature and give us the vertical and horizontal scalability we need to deliver the analytic performance users need as big data demands continue to grow across our organization,” he says. “That reliable analytic performance is crucial to the success of CNA’s diverse big data initiatives.”

Big Data Gain

CNA’s investment in big data tools is paying off, according to Wolfe. He says CNA can directly attribute about $10.5 million in savings since the introduction of analytics to its discovery of claims in the system that warranted the special attention of investigators.

Other savings may be harder to calculate, but are no less important. For example, by proactively flagging suspicious activity occurring across particular healthcare provider networks, Wolfe believes CNA has dramatically reduced its potential future exposure to the questionable behaviors within those networks. Once you see certain patterns of activity, you can simply but a block on the corresponding healthcare provider tax IDs, he says. “Those blocks alone may have saved us as much as $4.5 million in 2013.”

And Wolfe has some advice for technologists who want to succeed with big data: work closely with non-technical subject-matter experts in the business. “Claim fraud experts have deep knowledge about how entities are related to each other from an investigative perspective, while analytics experts have deep knowledge about how data associated with those entities and their relationships can be sliced and diced,’” he says. “If you can get those two very different groups communicating well enough, you can give yourself a pretty serious competitive edge.”