Today, we’re adding metadata to the list of issues that will need to be addressed before data lakes are a useful, realistic concept.
Recently, I’ve been sharing the key concerns and barriers around data lakes. Data lakes, at least in theory, are what you get when you pull Big Data sets, including unstructured data, together. The idea is that data lakes will replace or at least supplement data marts for accessing enterprise-wide information.
Vendors have been hyping up data lakes, but many experts are questioning how realistic data lakes are right now. The challenge isn’t so much creating them as it is managing the data in a useful way, experts say.
The question is, why? As I shared in my previous post on current data tools, analytics expert Tom Davenport has said widespread use of big data, which is a major reason for building a data lake, is unachievable without better tools for integration, transformation and just generally fishing around the data.
On the other hand, data expert David Linthicum, who writes for Informatica’s blog, defends existing tools and sees big data stores as a driver for solutions that combine data integration with data cleansing.
Robin Bloor, president and principal analyst for The Bloor Group, raises another important barrier in an Information Management article.
Old world metadata was designed with a minimalist approach. Its goal was to explain just enough about the tables to support integration between programs.
The problem: It lacked context. Is this data about employees or contractors? Who knows? The computer doesn’t care anyway; it just needs to know what’s in the actual fields.
In the “new world” of data, including social media, Internet of Things, public data, streaming data and multiple sets of data, context is critical.
I think it’s easy to underestimate just what a key difference this is between the old world and the new world. Bloor touches on this, but Seth Grimes really drills down on the strategic value of adding that context by looking at mobile devices in his piece, “Metadata, Connection, and the Big Data Story.”
Grimes quotes Marie Wallace, IBM analytics strategist, who calls mobile the “mother load of contextual metadata.”
“The biggest piece of missing information isn’t the content itself, but the metadata that connects various pieces of content into a cohesive story,” Wallace said. “… Once we combine interactional information with the business action, we can derive insights that will truly transform the social business.”
In other words, metadata is key to adding some of the humanity back into the data. New world metadata provides that context, and therefore can provide valuable clues to motivation and other factors that drive customers.
That brings us back round to Bloor and the disconnect between “old world” and “new world” metadata.
“In essence the main problem with metadata is that it’s not as meaningful as it needs to be for users to know exactly what the data is,” he writes. “You cannot share data effectively without managing the metadata. And this is not as simple as it might seem.”
This is yet another issue that will need to be solved before data lakes are business-ready.
Bloor outlines three ways we can deal with this problem, including master data management (MDM). You should read the full piece for his thoughts, but — SPOILER ALERT — he pretty quickly demolishes two of the options. That leaves us with the one “pragmatic” solution, which he says is creating a map of your metadata.
The term “mapping” always makes me think of pirates, but in this case, he means using a registry for organizing the metadata. That may sound like MDM to some of you, and he cautions it can be a precursor to MDM, but is not actually MDM.
In a world where we often equate tools with discipline, that’s an important clarification.