Looking Beyond EMC’s Announcement
Well, they finally made it official. The announcement this week from EMC marked its formal entry into “Big Data” field with an appliance solution for Hadoop – a product called Greenplum HD Data Computing Appliance, which will be rolling out later this year. The move firmly places EMC in competition with those already in play (on different levels) in a market that is quickly heating up, including IBM, Cloudera, Platform Computing, Hstreaming, Yahoo, and a handful of others, each with offerings looking to tame the challenges of “Big Data.”
EMC’s offering is a bundled appliance solution for Hadoop. The solution will integrate EMC’s own Greenplum Data Computing Appliance with a distribution of Hadoop software. Although it is designed as a plug-n-play solution for those running Hadoop, when it comes to support, EMC will have to work out a decent plan with its partners to make it hassle free for customers.
Notable benefits of EMC’s offering, as the company claims, are performance (for their Enterprise edition), fault tolerance, and a turn-key solution. No mention of any performance benchmarks, potential use cases or support plans were in the announcement.
Despite all the benefits listed, the appliance, however, does not address some of the important “Big Data” requirements that have been preventing users from moving their applications into production. In particular, I’m referring to a lack of high resource utilization that allows users to do more with less, high reliability and efficient manageability that guarantee high level SLA requirements.
“Big Data” is a hot topic today. So hot that almost every IT provider, regardless of its area of focus, wants a piece of the pie. The result? Confused customers. So to narrow down the game play, let’s first take a look at who’s who in the field. There are really two types of solution providers today : 1) those offering (or who will offer) a full software stack for Hadoop, such as IBM, EMC, Cloudera, Aster Data, etc.; and 2) those who provide best-of-breed component solutions in the software stack, such as Platform Computing. For the former, the major advantage is support for all layers of the stack (application, distributed runtime, and data). However, the trade off is living with shortcomings (poor reliability, scalability, low resource utilization, just to name a few) in each layer of the stack. For the latter, the focus is on delivering best-in-class component layer(s) within the full stack to address specific sets of requirements IT and end users demand for their MapReduce applications – think of department store vs. specialty store analogy.
We call Platform Computing’s upcoming solution for MapReduce a “best of breed” solution because its sole focus is to provide the most complete distributed runtime engine within the Hadoop software stack to make MapReduce applications enterprise ready. What lies underneath that stack is Platform Computing’s years of expertise in distributed workload management and resource management. It’s a proven enterprise level technology and is the foundation on which many Fortune 500 companies run mission critical, extremely demanding distributed workloads. Bringing this enterprise capability to the “Big Data” environment is a natural market expansion for the company. As the full-stack wars heat up, Platform’s solution can be easily integrated into any alternative stack as a compatible replacement for Hadoop-based runtime environments and become a value–add to its partners. Platform Computing will roll out its first MapReduce distributed runtime offering in early June, which will be a major mile-stone for the company following a well received announcement of our support for MapReduce in March. In the upcoming weeks, we will be providing more details on this new product so you can understand why we call it “best of breed.” Stay tuned everyone!