Fulfilling Big Data's Big Promises

By Joe McKendrick June 14, 2013, 2:13 p.m. EDT 3 Min Read

I was invited to speak on a panel later in the month at a nearby university on the subject of big data, but the moderator made an interesting request. While the conference itself was full of sessions on the marvels of big data, he asked if I could play devil's advocate, and utter some words of caution about big data.

It's worthwhile to think through a strategy before rushing headlong into anything, and indeed, there has been a backlash brewing against the claims being made about deep, flawless analytic insights big data delivers. So, there are plenty of reasons for healthy skepticism.

In big data scenarios, you have managers not trained in statistics making bet-the-business decisions based on data of unknown quality originating from unvetted sources. You have MBAs that think they can push a button to have insights auto-generated for them. What could possibly go wrong with that?

First, let's be clear on what, exactly, big data is. Yes, it's having a lot of it, to the point where disk arrays are bursting at the seams. But this is all relative, since by this definition alone, there has always been “big data.” A decade ago, there were some organizations with databases topping the one-terabyte mark in storage, and that was huge. What's different these days is the types of data coming into organizations — the unstructured stuff, the log files, sensor data, social data, documents and the like. Many refer to it as “variety,” as the second of four “Vs” that define big data, along with volume, velocity (real time?) and value. But I think variety is what really makes this generation of big data so special.

This sounds all good on paper, and I'm sure everyone out there is intimately familiar with unstructured or schema-less data — we deal with it every time we create a Word or Excel document, or post something to Twitter. The problem is, most organizations simply don't know what they have in terms of this data. They don't have any sense of the amount of this data, or what part of it may be valuable, versus junk. You can't press forward on a big data initiative on an unknown quantity.

The issues of trustworthiness were painstakingly worked out over the years in the data warehouse space, with data warehouse managers working closely with each contributor, making sure that the data they were feeding into the warehouse was not duplicated with that of other sources, was synced in terms of time horizons, and had common field names and formats. Now, data warehouses are but one component of the avalanche of data coming on the scene – and it's simply too expensive and will slow everything down to try to jam everything into the warehouses and optimize the information via extract, transform and load mechanisms.

big data is opening up new worlds for insurers, and is a path that needs to be followed. But it's a journey that needs to be well-managed and thought-through. For inspiration, look to what took place with data warehousing — a big-deal technology initiative that provided a lot of lessons of a management nature. These lessons need to be re-visited and applied across the business:

• Begin to inventory the data you have, especially the unstructured information. Start with small pilot projects to measure and attempt to capture some of this data on an enterprise level.

• Work closely with business units, determine what kind of data analysis or access will solve particular problems. Help them design databases with well-cleansed that align with the rest if the enterprise. To adequately store unstructured data, they may need to acquire a so-called “NoSQL” database.

• Build business analysis capabilities. Even if data scientists are out of your budget, train or hire individuals who can look at data and not only ask the right questions, but questions that have never been asked before.

• Encourage critical thinking among business users of the data: What is the source of the information? Are there other potential sources that will help build a conclusion? And, very importantly: What is the context of this data?

Joe McKendrick is an author, consultant, blogger and frequent INN contributor specializing in information technology.

Readers are encouraged to respond to Joe using the “Add Your Comments” box below. He can also be reached at joe@mckendrickresearch.com.

This blog was exclusively written for Insurance Networking News. It may not be reposted or reused without permission from Insurance Networking News.

The opinions of bloggers on www.insurancenetworking.com do not necessarily reflect those of Insurance Networking News.

Joe McKendrick

Dig In contributor