Data Lakes: Sink or Swim?
We are all fascinated with data. And in recent years, overwhelmed by the volume of it. But wait there’s more… data comes in different sizes, shapes and speeds, it’s processed across various platforms and languages, and analyzed in batch, streaming and interactive modes. To top it all, developers, analysts, data scientists, marketers and executives all want to use it.
Sierra has been waist-deep in the data warehouse space, being early investors in leaders Teradata and Greenplum (acquired by EMC). But in this dynamic environment, the limitations of data warehouses are becoming clear. They store data from various sources in some specific static structures and categories that dictate the kind of analysis that is possible on that data, at the very point of entry.
This is causing enterprise CIOs to look beyond data warehouses to solutions called “data lakes”. A data lake is a storage repository that holds a vast amount of raw data in its native format, including structured, semi-structured, and unstructured data. The key difference is that the data structure and requirements are not defined until the data is needed.
“If you think of a datamart (or a data warehouse) as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state,” according to James Dixon, CTO of Pentaho. “The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples.”
This market trend was recently validated by many members of the Sierra CXO Advisory Board and led to our recent investment in Zaloni. The company is helping operationalize data lakes with an integrated management platform that provides visibility, governance, and reliability to the data lake. Importantly, it supports hybrid environments, which seem to be common among the CIOs we spoke with.
We are excited by the diverse but game-changing use cases that the company is helping facilitate… smart city initiative in one of the fastest growing metropolises, subscriber usage analytics for top 3 telco, pharmaceutical pipeline analysis, hospitality and airline loyalty programs, healthcare data analytics, and more. And all done under the radar from Raleigh-Durham, North Carolina!