Confusion Surfaces about the Data Lake

Published July 30, 2014, 10:28 a.m. EDT

1 Min Read

The growing hype around data lakes is causing lots of confusion in the information management space, according to Gartner Inc.

Processing Content

Several vendors are marketing data lakes — storage repositories that hold huge amounts of raw data until it’s needed — as essential components to realizing big data opportunities, the firm says, but there’s little alignment among vendors about what comprises a data lake or how to get value from it.

"In broad terms, data lakes are marketed as enterprise-wide data management platforms for analyzing disparate sources of data in its native format," Nick Heudecker, research director at Gartner, said in a statement. "The idea is simple: instead of placing data in a purpose-built data store, you move it into a data lake in its original format. This eliminates the upfront costs of data ingestion, like transformation. Once data is placed into the lake, it's available for analysis by everyone in the organization."

But while the marketing messages suggest users throughout an enterprise will leverage data lakes, this assumes that all the users are highly skilled at data manipulation and analysis, as data lakes lack semantic consistency and governed metadata, Gartner says.

"The need for increased agility and accessibility for data analysis is the primary driver for data lakes," Andrew White, vice president and distinguished analyst at Gartner, said in a statement. "Nevertheless, while it is certainly true that data lakes can provide value to various parts of the organization, the proposition of enterprise-wide data management has yet to be realized."

This story first appeared at Information Management.