Blog Series – Five Challenges for Hadoop MapReduce in the Enterprise, Part 4

Challenge #4: Lack of Quality of Service

We are back after a short break.  The challenge with the current implementation of Hadoop MapReduce continues.  In this blog let’s take a look at the fourth challenge in the existing Hadoop stack – the lack of quality of service.

By high quality of service, we are referring to the capability of dynamically allocating available IT infrastructure based on workloads requirements, maximizing resource utilization and preventing silos. Those capabilities lead to better application performance and faster time to results, and therefore, provide high return on investment for the IT organization.  The current architecture design of the existing open source Hadoop stack puts limitations on the above capabilities. As mentioned in part 2 of this blog series, the single job tracker in the current Hadoop implementation is not separated from the resource manager, so as a result, the job tracker does not provide sufficient resource management functionalities to allow dynamic lending and borrowing of available IT resources. This creates a static IT environment in which each Hadoop application can only run on a pre-assigned set of resources at a given time and no exceptions are allowed. As the requirements of the application changes, resources will need to be re-configured manually to meet new demand. Such a static IT infrastructure brings the following issues for an IT organization:

·         Unable to provide the necessary and guaranteed services to multiple lines of businesses
·         Unable to manage real-time workload requirements
·         Slower performance and time to discovery
·         Increased management complexity

The result?  Underutilized resources and a higher total cost of ownership for IT.

In contrast to a static IT infrastructure, a sophisticated runtime built on a service oriented architecture (SOA) evolution.  brings quality of service to IT organizations committed to providing high quality services to their internal and/or external clients. Such a runtime solution will help transform IT into a true service provider and help meet demanding requirements (-high availability, dynamic resource allocations, ease of management, etc.) in the production environment. As new technologies such as Hadoop and MapReduce continue their penetration into the mainstream market, more applications will be developed and moved into production. Quality of service will undoubtedly become a critical consideration for IT in the next wave of the Big Data

Please join us at SC11 for a free breakfast briefing: “Overcoming Your MapReduce Barriers”.  Register today to secure your spot!

Why Combine Platform Computing with IBM?

You may have read the news that Platform Computing has signed a definitive agreement to be acquired by IBM and you may wonder why. I’d like to share with you our thinking at Platform and what our relevance is to you and the dramatic evolution of enterprise computing. Even though not an old man yet and usually too busy doing stuff, for once I will try to look at the past, present and future.

After the first two generations of IT architecture, centralized mainframe and networked client/server, IT has finally advanced to its third generation architecture of (true) distributed computing. An unlimited number of resource components, such as servers, storage devices and interconnects, are glued together by a layer of management software to form a logically centralized system – call it virtual mainframe, cluster, grid, cloud, or whatever you want. The users don’t really care where the “server” is, as long as they can access application services – probably over a wireless connection. Oh, they also want those services to be inexpensive and preferably on a pay-for-use basis. Like a master athlete making the most challenging routines look so easy, such a simple computing model actually calls for a sophisticated distributed computing architecture. Users’ priorities and budgets differ, apps’ resource demands fluctuate, and the types of hardware they need vary. So, the management software for such a system needs to be able to integrate whatever resources, morph them dynamically to fit each app’s needs, arbitrate amongst competing apps’ demands, and deliver IT as a service as cost effectively as possible. This idea gave birth to commodity clusters, enterprise grids, and now cloud. This has been the vision of Platform Computing since we began 19 years ago.

Just as client/server took 20 years to mature into the mainstream, clusters and grids have taken 20 years, and cloud for general business apps is still just emerging. Two areas have been leading the way: HPC/technical computing followed by Internet services. It’s no accident that Platform Computing was founded in 1992 by Jingwen Wang and I, two renegade Computer Science professors with no business experience or even interest. We wanted to translate ‘80s distributed operating systems research into cluster and grid management products. That’s when the lowly x86 servers were becoming powerful enough to do the big proprietary servers’ job, especially if a bunch of them banded together to form a cluster, and later on an enterprise grid with multiple clusters. One system for all apps, shared with order. Initially, we talked to all the major systems vendors to transfer university research results to them, but there was no taker. So, we decided to practice what we preached. We have been growing and profitable all these 19 years with no external funding. Using clusters and grids, we replaced a supercomputer at Pratt & Whitney to run 10 times more Boeing 777 jet engine simulations, and we supported CERN to process insane amounts of data looking for God’s Particle. While the propeller heads were having fun, enterprises in financial services, manufacturing, pharmaceuticals, oil & gas, electronics, and the entertainment industries turned to these low cost, massively parallel systems to design better products and devise more clever services. To make money, they compute. To out-compete, they out-compute.

The second area adopting clusters, grids and cloud, following HPC/technical computing, is Internet services. By the early 2000s, business at Amazon and Google was going gangbusters, yet they wanted a more cost effective and infinitely scalable system versus buying expensive proprietary systems. They developed their own management software to lash together x86 servers running Linux. They even developed their own middleware, such as MapReduce for processing massive amounts of “unstructured” data.

This brings us to the present day and the pending acquisition of Platform by IBM. Over the last 19 years, Platform has developed a set of distributed middleware and workload and resource management software to run apps on clusters and grids. To keep leading our customers forward, we have extended our software to private cloud management for more types of apps, including Web services, MapReduce and all kinds of analytics. We want to do for enterprises what Google did for themselves, by delivering the management software that glues together whatever hardware resources and applications these enterprises use for production. In other words, Google computing for the enterprise. Platform Computing.

So, it’s all about apps (or IaaS J). Old apps going distributed, new apps built as distributed. Platform’s 19 years of profitable growth has been fueled by delivering value to more and more types of apps for more and more customers. Platform has continued to invest in product innovation and customer services.

The foundation of this acquisition is the ever expanding technical computing market going mainstream. IDC has been tracking this technical computing systems market segment at $14B, or 20% of the overall systems market. It is growing at 8%/year, or twice the growth rate of servers overall. Both IDC and users also point out that the biggest bottleneck to wider adoption is the complexity of clusters and grids, and thus the escalating needs for middleware and management software to hide all the moving parts and just deliver IT as a service. You see, it’s well worth paying a little for management software to get the most out of your hardware. Platform has a single mission: to rapidly deliver effective distributed computing management software to the enterprise. On our own, especially in the early days when going was tough, we have been doing a pretty good job for some enterprises in some parts of the world. But, we are only 536 heroes. Combined with IBM, we can get to all the enterprises worldwide. We have helped our customers to run their businesses better, faster, cheaper. After 19 years, IBM convinced us that there can also be a “better, faster, cheaper” way to help more customers and to grow our business. As they say, it’s all about leverage and scale.

We all have to grow up, including the propeller heads. Some visionary users will continue to buy the pieces of hardware and software to lash together their own systems. Most enterprises expect to get whole systems ready to run their apps, but they don’t want to be tied down to proprietary systems and vendors. They want choices. Heterogeneity is the norm rather than exception. Platform’s management software delivers the capabilities they want while enabling their choices of hardware, OS and apps. 

IBM’s Systems and Technology Group wants to remain a systems business, not a hardware business nor a parts business. Therefore, IBM’s renewed emphasis is on systems software in its own right. IBM and Platform, the two complementary market leaders in technical computing systems and management software respectively, are coming together to provide overall market leadership and help customers to do more cost effective computing. In IBM speak, it’s smarter systems for smarter computing enabling a Smarter Planet. Not smarter people. Just normal people doing smarter things supported by smarter systems.

Now that I hopefully have you convinced that we at Platform are not nuts coming together with IBM, we hope to show you that Platform’s products and technologies have legs to go beyond clusters and grids. After all, HPC/technical computing has always been a fountainhead of new technology innovation feeding into the mainstream. Distributed computing as a new IT architecture is one such example. Our newer products for private cloud management, Platform ISF, and for unstructured data analytics, Platform MapReduce, are showing some early promise, even awards, followed by revenue. 

IBM expects Platform to operate as a coherent business unit within its Systems and Technology Group. We got some promises from folks at IBM. We will accelerate our investments and growth. We will deliver on our product roadmaps. We will continue to provide our industry-best support and services. We will work even harder to add value to our partners, including IBM’s competitors. We want to make new friends while keeping the old, for one is silver while the other is gold. We might even get to keep our brand name. After all, distributed computing needs a platform, and there is only one Platform Computing. We are an optimistic bunch. We want to deliver to you the best of both worlds – you know what I mean. Give us a chance to show you what we can do for you tomorrow. Our customers and partners have journeyed with Platform all these years and have not regretted it. We are grateful to them eternally.

So, with a pile of approvals, Platform Computing as a standalone company may come to an end, but the journey continues to clusters, grids, clouds, or whatever you want to call the future. The prelude is drawing to a close, and the symphony is about to start. We want you to join us at this show.

Thank you for listening.

Taming Big Data – A Recap of the O’Reilly Strata Conference

O’Reilly Strata, a conference dedicated to data science, held its second meeting of the year from September 22 – 23 in New York City.  The conference drew close to 500 attendees, including all the major technology providers in the space as well as user organizations from various industries that are dealing with Big Data problems. Platform Computing was one of the sponsors of the conference, and our introduction of the newly released Platform MapReduce 1.5 generated wide interest among conference participants.

Platform team greets booth visitors 

My takeaways from the conference are following:
1)   Data science is hot. The Strata conference lured participants from various industries. We met startups working on web analytics, energy companies trying to analyze their log files, banks of looking for use cases and solutions to help analyze their large data sets fast, and of course, Web 2.0 companies who are already at the forefront of tackling Big Data problems exploring  better solutions than what they are currently using. The enthusiasm around cracking Big Data   has never been higher. 
2)   Big Data market is still young. Many discussions we had at the show revealed that majority of the main stream organizations (excluding Web2.0 companies like Google, Yahoo, Facebook) are still at the early stages of the adoption where they are either exploring various technologies on the market, figuring out the proper applications built upon new technologies such as Hadoop and MapReduce, or in the midst of building small pilot projects. Comments such as “We are thinking about moving certain applications to Hadoop,” or “We only have one Hadoop project running at this time” were often heard at the conference.  A majority of the organizations that have large amounts of data are just beginning to tap into Big Data and looking into proper use cases.. The FAQ these days is “What to do with my data?” and there is no simple answer to that. 
3)   There’s a shortage of skilled resources. While Hadoop and MapReduce appear to be promising approaches to access and analyze Big Data, they are also new to developers, and the learning curve is rather steep. Adding to the length of the learning cycle is the fact that development tools in the ecosystem are still yet to mature. The reality is that there is lack of production quality applications for main stream user, most codes today are developed in-house and still being tested in R&D labs.  
4)   The market is fragmented. Various tools have been developed to fulfill the ecosystem while the mainstream market is still catching up on the basics.. We believe the gap will close and the market will eventually hit the tipping point as more applications become available. But for now, the path for Big Data remains wildly unpredictable. 
Big Data is here to stay. However, “Making Data Work”, the slogan at O’Reilley Strata conference, is no easy task. Companies dealing with  large amounts of data have a lot on their plates today --  disruptive technologies,  new application development,  understanding the meaning of the  new discoveries and their business impact, just to name a few. Needless to say, a well built-out ecosystem is critical to support all the efforts taking place in the market.    At Platform Computing, we are committed to not only providing the best solution in the ecosystem through leveraging our proven technology, but also working toward  bringing a viable,  end-to-end solution to the market.