ISC cloud 2011

The European conference concerning the intersection between cloud computing and HPC has just finished, and it's very pleasing to report that this conference delivered considerable helpings of useful and exciting information on the topic.

Though cloud computing and HPC have tended to stay separated, the HPC community starting with the sc2010 conference, interest has been gaining primarily because access to additional temporary resource is very temping. However, other reasons for HPC users and architects to evaluate the viability of cloud include total cost of ownership comparisons, and startup businesses which may need temporary access to HPC but do not have the capital to purchase dedicated infrastructure.

Conclusions varied from presenter to presenter, tough some things were generally agreed upon:

  1. if using Amazon EC2, HPC applications must use the cluster compute instance to achieve comparable performance to local clusters.
  2. fine grained MPI applications are not well suited to the cloud simply because none of the major vendors offer infiniband or other low latency interconnect on the back end
  3. running long term in the cloud, even with favorable pricing agreements is much more expensive than running in local data centers, as long as those data centers already exist. (no one presented a cost analysis which included the datacenter build costs as an amortized cost of doing HPC.)


Another interesting trend was the different points of view depending on where the presenter came from. Generally, researchers from national labs had the point of view that cloud computing was not comparable to their in-house supercomputers and was not a viable alternative for them. Also, compared to the scale of their in-house systems, the resources available from Amazon or others were seen as quite limited.

Conversely, presenters from industry had the opposite point of view (notably a presentation given by Guillaume Alleon from EADS). Their much more modest requirements seemed to map much better into the available cloud infrastructure and the conclusion was positive for cloud being a capable alternative to in-house HPC.

Perhaps this is another aspect of the disparity between capability and capacity HPC computing. One maps well into the cloud, the other doesn't.

Overall it was a very useful two days. My only regret was not being able to present Platform's view on HPC cloud. See my next blog for some technologies to keep an eye on for overcoming cloud adoption barriers. Also, if anyone is interested in HPC and the cloud, this was the best and richest content event I've ever attended. Highly recommended.

Congratulations to Rick Parker – a Platform Customer, Partner and SuperNova Semifinalist!


Over the past year, I’ve had the pleasure of working closely with someone I regard as a true visionary when it comes to IT management and unlocking the power of private cloud computing. I am referring to Rick Parker who, until recently, has been the IT Director at Fetch Technologies, a longtime customer of Platform Computing, which leverages Platform ISF for their private cloud management. Now I am proud to now call Rick by a new name: Protostar! From a pool of more than 70 applicants, Rick was recognized in Constellation Research’s SuperNova Awards among an elite group of semifinalists that have overcome the odds in successfully applying cloud computing technologies within their organizations.

Most award programs recognize technology suppliers for advancements in the market.  Few programs recognize individuals for their courage in battling the odds to affect change in their organizations.  The Constellation SuperNova Awards celebrate the explorers and pioneers who successfully put new technologies to work and the leaders that have created disruptions in their market. Rick fits the bill perfectly, and an all-star cast of judges (including Larry Dignan at ZDNET and Frank Scavo at Constellation - to name a couple) have agreed. The award recognized applicants who embody the human spirit to innovate, overcome adversity, and successfully deliver market changing approaches.  

Here is an excerpt from the award nomination to give you a peek into Rick’s story of implementing cloud computing at Fetch (with a little help from Platform Computing solutions!).
Since joining Fetch, Rick has successfully implemented a dynamic, nearly completely virtualized data center leveraging private clouds and Platform Computing’s ISF cloud management solution. Rick’s journey originally began with a simple idea: to build the perfect data center that moved IT away from the business of server management and toward true data center management. As part of this vision, Rick founded Bedouin Networks to create one of the first, if not the first, public cloud services in 2006 and deliver radically improved cost effectiveness and reliability in data center design.
In 2007, Rick carried his passion for disruptive data centers with him when he joined Fetch Technologies. Fetch Technologies is a Software-as-a-Service (SaaS) provider that enables organizations to extract, aggregate and use real-time information from websites and, as such, depends on its ability to maximize data center resources, efficiently and effectively. At any given moment, Rick and his IT team can get a call to provision more compute resources for Fetch’s fast-growing customer base. Before using Platform ISF, Fetch provisioned resources manually to increase SaaS capacity, which usually took several hours of personnel time per server. The cost effectiveness of Rick’s innovative design not only made Fetch’s products and services more profitable, it made them possible. The expenditure that would have been required for the hundreds of physical servers, networking, and data center costs to enable the additional capacity required could have exceeded the potential revenue.

A Protostar
Rick has been a vital asset to the Fetch team and an incredible partner to the Platform Computing team over the years. We congratulate him on this latest recognition and wish him the best of luck in his newest adventure – in search of more disruption – in his new position as Cloud Architect at Activision. We look forward to working with Rick on his next private cloud adventure. Congratulations, Protostar!

One small step for man, one enormous leap for science

News from CERN last week that E may not, in fact, equal mc2 was earth shattering. As the news broke, physicists everywhere quivered in the knowledge that everything they once thought true may no longer hold and the debate that continues to encircle the announcement this week is fascinating. Commentary ranges from those excited by the ongoing uncertainties of the modern world to those who are adamant mistakes have been made.

This comment from Matt Alden-Farrow on the BBC sums up the situation nicely:

“This discovery could one day change our understanding of the universe and the way in which things work. Doesn’t mean previous scientists were wrong; all science is built on the foundation of others work.”

From our perspective, this comment not only sums up the debate, but also the reality of the situation. Scientific discoveries are always built on the findings of those that went before and the ability to advance knowledge often depends on the tools available.

Isaac Newton developed his theory of gravity when an Apple fell on his head – the sophisticated technology we rely on today just didn’t exist. His ‘technology’ was logic. Einstein used chemicals and mathematical formulae which had been discovered and proven. CERN used the large hadron collider and high performance computing.

The reality is that scientific knowledge is built in baby steps, and the time these take is often determined by the time it takes for the available technology to catch up with the existing level of knowledge. If we had never known Einstein’s theory of relativity, who’s to say that CERN would have even attempted to measure the speed of particle movement?

IDC HPC User Forum – San Diego

Platform just returned from attending the IDC HPC User Forum held from Sept. 6-8 in San Diego.


As opposed to previous years, this year’s event seemed to have drawn fewer people from the second and third tiersof the HPC industry. Overall attendance for this event also appeared to be about half of that compared to the event in April in Texas.


This time the IDC HPC User Forum was dominated by a focus on software and the need for recasting programming models,. There was also a renewed focus on getting ISVs and open source development teams to adopt programming models that can scale far beyond the limits they currently have. Two factors are driving this emphasis
  • Extremely parallel internals for compute nodes (from both a multi-core and an accelerator [CUDA, Intel, AMD] points of view).
  • The focus on “exa” scale, which by all counts will be achieved by putting together ever increasing numbers of commodity servers

Typically there is a theme to the presentations for the multi-day event, and this forum was no different. Industry presentations were very focused on the material science being performed primarily by the US national labs and future applications of the results being obtained. The product horizon on the technologies presented was estimated at approximately 10 years.


In contrast to the rest of the industry which is very cloud focused right now, cloud computing was presented or mentioned only three times by various vendors and also mentioned by by the Lawrence Berkeley National Laboratory (LBNL) at the forum. When it comes to cloud, there seems to be a split between what the vendors are focusing on and what the attendees believe. Specifically, attendees from national laboratories tend to be focused on “capability” computing (e.g. large massively parallel jobs running on thousands of processors). Jeff Broughton from LBNL presented some data from a paper that showed how, for the most part, cloud computing instances are not ready for the challenge of doing HPC.


Though we can’t refute any of the data or claims made by Mr. Broughton, the conclusions that may be drawn from his data might extend beyond what the facts support. For instance, in our experience here at Platform, we’ve found that most HPC requirements in industry do not span more than 128 cores in a single parallel job nor do they require more than 256 GB of memory. The requirements for most companies doing HPC are significantly more modest and are therefore much more viable to be addressed by a cloud computing model.


We at Platform have long been fighting the “all or nothing” notion of HPC employing cloud technology. Rather, we believe that industry – especially in Tiers 2 and 3, to a lesser or greater degree, will be able to make extremely beneficial use of cloud computing to address their more modest HPC requirements. Platform is focused on developing products to help these customers easily realize this benefit. Stay tuned for more on Platform’s cloud family of products for HPC—there will be more on this in the coming months…

Blog Series – Five Challenges for Hadoop MapReduce in the Enterprise, Part 3


Challenge #3: Lack of Application Deployment Support

In my previous blog, I explored the shortcomings in resource management capabilities in the current open source Hadoop MapReduce runtime implementation. In this installment of the “Five Challenges for Hadoop MapReduce in the Enterprise” series, I’d like to take a different view on the existing open source implementation and examine the weaknesses in its application deployment capabilities. This is critically important because, at the end of day, it is the applications that a runtime engine needs to drive, without a sufficient support mechanism, a runtime engine will only have limited use.
 
To better illustrate the shortcomings in the current Hadoop implementation for its application support, we use below diagram to demonstrate how the current solution handles workloads.


As shown in the diagram, the current Hadoop implementation does not provide multiple workload support. Each cluster is dedicated to a single MapReduce application so if a user has multiple applications, s/he has to run them in serial on that same resource or buy another cluster for the additional application. This single-purpose resource implementation creates inefficiency, a siloed IT environment and management complexity (IT ends up managing multiple resources separately).

Our enterprise customers have told us they require  a runtime platform designed to support mixed workloads running across all resources simultaneously so that multiple lines of business can be served. Customers also need support for workloads that may have different characteristics or  are  written in different programming languages. For instance, some of those applications could be data intensive such as MapReduce applications written in Java, some could be CPU intensive such as Monte Carlo simulations which are often written in C++ -- a runtime engine must be designed to support both simultaneously.  In addition, the workload scheduling engine in this runtime has to be able to handle many levels of fair share scheduling priorities and also be capable of handling exceptions such as preemptive scheduling. It needs to be smart enough to detect resource utilization levels so it can reclaim functionalities when the resources are available.  Finally, a runtime platform needs to be application agnostic so that developers do not have to make code changes or recompile to adapt the runtime engine supporting their applications. The architecture design of the current Hadoop implementation simply does not provide those enterprise-class features required in a true production environment.