Data lakes are becoming the preferred platform for modernizing data environments, but many organizations continue to struggle with managing them. The result for many is something more closely resembling a data swamp than a data lake.
That is the conclusion of Ben Sharma, chief executive officer at Zaloni, Inc. in Durham, NC, who recently spoke with Information Management about trends he saw among organizations attending the recent Strata World conference in March in San Jose.
A growing number of organizations are using data lakes to either augment data warehouses or to serve as the enterprise data hub, Sharma says.
“But unmanaged, poorly thought-through data lakes are simply data swamps and their usefulness decays over time,” Sharma explains.
“Companies are realizing that they need more agile data platforms and deeper analytical capabilities to compete effectively in their market,” Sharma says. “The major trend we see is organizations moving from sandbox or single purpose big data applications to enterprise-wide governed data lake implementations.”
A number of other topics emerged as top-of-mind from the Strata event.
“The Internet of things is a big topic. Machine learning is also on everyone’s list. It is early stage but as an industry we are all looking for ways to leverage automated algorithms to improve our understanding of our data and to get faster insight,” Sharma says.
“We are also seeing a real emergence of IT in the big data landscape,” Sharma says. “As data lakes become more mission critical, organizations are looking to IT to provide the governance, security and automation required for these applications.”
Perhaps the biggest challenge organizations are facing is “finding, rationalizing and curating the data from across an enterprise for analytics solutions,” Sharma explains. “Attendees noted that the ability to easily access data, refine data and collaborate on data needs continues to be a large roadblock for many analytic applications.”
“While there are increasingly powerful and effective analytics applications, the data management, integration and governance activities continues to be a major hurdle in rapidly making effective use of scale out architectures. For this reason, many organizations are still slow to adopt big data technologies in a production capacity,” Sharma says.