Answers to questions I frequently encounter with Hadoop:
What does Hadoop do, and what doesn't it do?
It's important to understand that Hadoop is very focused on the Big Data problem. It knows that its job is to crunch massive amounts of unstructured, opaque data down to small, structured insights as quickly and inexpensively as possible, and it's very good at that job. What Hadoop doesn't do is show you those insights in a way that makes sense to us humans. Taking the insights and getting them in front of your CEO's eyeballs is still your responsibility. Luckily, there are a lot of great technologies to help you with that!
In your opinion, what are the best technologies to combine with Hadoop?
Any stack is going to require a place to store your Hadoop insights, a way to get at that data (say, as a web API), and a way to view the data. My favorite stack is Hadoop, Mongo DB for storage, node.js for your web API, and SproutCore for the rich, data-driven sophistication that it brings to web application development.
Why those technologies, and why is this solution unique?
Each interface (e.g. Hadoop to Mongo, Mongo to node) is clear, well-established, and best-in-class. One of the biggest challenges to heterogenous systems is cleanly translating data formats between layers; this system doesn't have that problem, because the data is JSON all the way down! Hadoop and MongoDB work very well together, and ditto MongoDB and node. I'm a node acolyte myself, but I know that Ruby can do a good job here as well. If your dashboard needs are very simple – for example, reload to view an updated pie chart – then SproutCore is overkill, but if you're looking for an interactive, live-updating, drillable dashboard then SproutCore has all the tools you need to build sophisticated, data-driven rich web apps. The best thing about this solution is that it's high profile open-source from tip to toe. So just like Hadoop means bigger data on a smaller budget, this entire solution gets Hadoop's insights in front of important eyeballs with zero licensing fees. And all of these technologies are at the core of Appnovation's competencies: We know how to build great products in each of them, and we can provide ongoing support and peace of mind.
What sort of use cases do you see this solution working for? What's the real value to customers here?
Let's say you're a regional retail giant. Your inventory management system runs on an overnight batch cycle, so if some radio DJ in Framingham unexpectedly plugs Widget A and your Framingham store is sold out of it by 10AM, your inventory guy doesn't know about it until the next morning, and probably can't restock until day 2. By that time, the DJ is talking about something else. By moving your batch cycle analysis to Hadoop, you can scale your system with commodity hardware and run that batch cycle every two hours. Your inventory system knows that Framingham is selling more Widget As than usual by 10AM, and it knows you're sold out by noon! The data pipes through the system almost instantly, and your SproutCore dashboard, which is open on your inventory guy's computer and automatically updating itself, is flashing red forty-five seconds later. By 1PM, he's got an overnight truck full of widgets scheduled from the warehouse to Framingham for arrival the next morning. You've cut your real-world, widget-on-the-shelf reaction time down from two days to less than one, allowing you to take quicker advantage of facts on the ground, and double your sales of Widget A. Want to know more about Hadoop and Big Data? Contact us today!