Challenge #2: Current Hadoop MapReduce implementations lack flexibility and reliable resource management
As outlined in Part 1 of this series, here at Platform, we’ve identified five significant challenges that we believe are currently hindering Hadoop MapReduce adoption in enterprise environments. The second challenge, addressed here, is a lack of flexibility and resource management provided by the open source solutions currently on the market.
Current Hadoop MapReduce implementations derived from open source are not equipped to address the dynamic resource allocation required by various applications. They are also susceptible to single points of failure on HDFS NameNode, as well as on JobTracker. As mentioned in Part 1, these shortcomings are due to the fundamental architectural design in the open source implementation in which the job tracker is not separated from the resource manager. As IT continues its transformation from a cost center to a service-oriented organization, the need for an enterprise–class platform capable of providing services for multiple lines of business will rise. In order to support MapReduce applications running in a robust production environment, a runtime engine offering dynamic resource management (such as borrowing and lending capabilities) is critical for helping IT deliver its services to multiple business units while meeting their service level agreements. Dynamic resource allocation capabilities promise to not only yield extremely high resource utilization but also eliminate IT silos, therefore bringing tangible ROI to enterprise IT data centers.
Equally important is high reliability. An enterprise-class MapReduce implementation must be highly reliable so there are no single points of failure. Some may argue that the existing solution in Hadoop MapReduce has shown very low rates of failure and therefore reliability is not of high importance. However, our experience and long history of working with enterprise-class customers has proved that in mission critical environments, the cost of one failure is measured in millions of dollars and is in no way justifiable for the organization. Eliminating single points of failure could significantly minimize the downtime risk for IT. For many organizations, that translates to faster time to results and higher profits.