More

    Eight Ways Any IT Division Can Use Hadoop

    It’s not too hard to find use cases for Hadoop, but many seem to focus on how Internet companies use Big Data. That’s great if you’re LinkedIn, CBS Interactive, or Four Square, but what about the rest of us?

    What do you do with Hadoop if you’re the CIO of a more traditional enterprise IT division?

    I’ve been reading a lot about this while researching “7 Enterprise-Friendly Ways of Dealing with Big Data,” from our sister site, Enterprise Apps Today.

    I found two common Hadoop uses that could be deployed by any IT department in any industry while researching that article. Since writing it, I’ve found more great examples of how CIOs can use Hadoop with existing systems, for a total of eight ways any IT division can use Hadoop:

    1. As a staging layer for analytics. In this scenario, the data is processed and filtered in Hadoop clusters, then fed downstream to a traditional large data warehouse, OLAP cubes or an in-memory analytics platform.

    2. As supplemental storage for an enterprise data warehouse platform. You can use Hadoop clusters as a staging layer for storage behind an EDW or data mart. In fact, Hive is an example of a data warehouse infrastructure that’s built on top of Hadoop clusters.

    3. As an acquisition and staging layer for unstructured content. This is one way companies are using Hadoop to filter through social media “noise” to find the good stuff that’s worth knowing about. To do that, you’ll need to couple Hadoop with a sentiment engine.

    4. As an ETL tool. Hadoop doesn’t just handle large amounts of data — it’s fast. Already, some companies are generating so much data in a day that it actually takes their ETL solutions longer than 24 hours to process it. By using Hadoop and MapReduce to perform the ETL process, they’re able to significantly reduce the time it takes to process that data.

    5. As an exploration engine. I’ve been quoting this excellent post by Ravi Kalakota this week, and this what he says is one of the three primary use cases for Hadoop. What’s cool about Hadoop in this situation is that you can add new data to existing data without having to reindex the entire cluster.

    6. An archive for historical data. Sometimes, you want to archive data, but you also want to be able to access it without the hassle of sending for and uploading the archives. Hadoop allows you to store large amounts of historical data without the tapes, giving you access to that data at any time, Kalakota points out.

    7. As an enterprise search solution. If you really want to search all your enterprise data, build an indexing infrastructure on top of Hadoop. It scales easily, so it will grow as your data grows. Plus, thanks to the distributed parallel architecture, it’ll be fast, according to Cloudera.

    8. As a data sandbox. Data warehouses are big, but unwieldy, which means if you want to put something in them, you need a plan. Hadoop is much more flexible, so some companies are using it to create a data sandbox where users can play with the data, and then if they find something worthwhile, they can add that query to the data warehouse. This use case should appeal to any company striving to be more “data-driven.”

    Of course, there are other great use cases that will apply across many industries — including building a recommendation engine or using Hadoop to evaluate customer churn. For more on those use cases, I’ll point you to Kalakota’s post and Cloudera’s slideshow or whitepaper, “Ten Common Hadoopable Problems.”

    Loraine Lawson
    Loraine Lawson
    Loraine Lawson is a freelance writer specializing in technology and business issues, including integration, health care IT, cloud and Big Data.

    Get the Free Newsletter!

    Subscribe to Daily Tech Insider for top news, trends, and analysis.

    Latest Articles