ISC cloud 2011

The European conference concerning the intersection between cloud computing and HPC has just finished, and it's very pleasing to report that this conference delivered considerable helpings of useful and exciting information on the topic.

Though cloud computing and HPC have tended to stay separated, the HPC community starting with the sc2010 conference, interest has been gaining primarily because access to additional temporary resource is very temping. However, other reasons for HPC users and architects to evaluate the viability of cloud include total cost of ownership comparisons, and startup businesses which may need temporary access to HPC but do not have the capital to purchase dedicated infrastructure.

Conclusions varied from presenter to presenter, tough some things were generally agreed upon:

  1. if using Amazon EC2, HPC applications must use the cluster compute instance to achieve comparable performance to local clusters.
  2. fine grained MPI applications are not well suited to the cloud simply because none of the major vendors offer infiniband or other low latency interconnect on the back end
  3. running long term in the cloud, even with favorable pricing agreements is much more expensive than running in local data centers, as long as those data centers already exist. (no one presented a cost analysis which included the datacenter build costs as an amortized cost of doing HPC.)

Another interesting trend was the different points of view depending on where the presenter came from. Generally, researchers from national labs had the point of view that cloud computing was not comparable to their in-house supercomputers and was not a viable alternative for them. Also, compared to the scale of their in-house systems, the resources available from Amazon or others were seen as quite limited.

Conversely, presenters from industry had the opposite point of view (notably a presentation given by Guillaume Alleon from EADS). Their much more modest requirements seemed to map much better into the available cloud infrastructure and the conclusion was positive for cloud being a capable alternative to in-house HPC.

Perhaps this is another aspect of the disparity between capability and capacity HPC computing. One maps well into the cloud, the other doesn't.

Overall it was a very useful two days. My only regret was not being able to present Platform's view on HPC cloud. See my next blog for some technologies to keep an eye on for overcoming cloud adoption barriers. Also, if anyone is interested in HPC and the cloud, this was the best and richest content event I've ever attended. Highly recommended.

Congratulations to Rick Parker – a Platform Customer, Partner and SuperNova Semifinalist!

Over the past year, I’ve had the pleasure of working closely with someone I regard as a true visionary when it comes to IT management and unlocking the power of private cloud computing. I am referring to Rick Parker who, until recently, has been the IT Director at Fetch Technologies, a longtime customer of Platform Computing, which leverages Platform ISF for their private cloud management. Now I am proud to now call Rick by a new name: Protostar! From a pool of more than 70 applicants, Rick was recognized in Constellation Research’s SuperNova Awards among an elite group of semifinalists that have overcome the odds in successfully applying cloud computing technologies within their organizations.

Most award programs recognize technology suppliers for advancements in the market.  Few programs recognize individuals for their courage in battling the odds to affect change in their organizations.  The Constellation SuperNova Awards celebrate the explorers and pioneers who successfully put new technologies to work and the leaders that have created disruptions in their market. Rick fits the bill perfectly, and an all-star cast of judges (including Larry Dignan at ZDNET and Frank Scavo at Constellation - to name a couple) have agreed. The award recognized applicants who embody the human spirit to innovate, overcome adversity, and successfully deliver market changing approaches.  

Here is an excerpt from the award nomination to give you a peek into Rick’s story of implementing cloud computing at Fetch (with a little help from Platform Computing solutions!).
Since joining Fetch, Rick has successfully implemented a dynamic, nearly completely virtualized data center leveraging private clouds and Platform Computing’s ISF cloud management solution. Rick’s journey originally began with a simple idea: to build the perfect data center that moved IT away from the business of server management and toward true data center management. As part of this vision, Rick founded Bedouin Networks to create one of the first, if not the first, public cloud services in 2006 and deliver radically improved cost effectiveness and reliability in data center design.
In 2007, Rick carried his passion for disruptive data centers with him when he joined Fetch Technologies. Fetch Technologies is a Software-as-a-Service (SaaS) provider that enables organizations to extract, aggregate and use real-time information from websites and, as such, depends on its ability to maximize data center resources, efficiently and effectively. At any given moment, Rick and his IT team can get a call to provision more compute resources for Fetch’s fast-growing customer base. Before using Platform ISF, Fetch provisioned resources manually to increase SaaS capacity, which usually took several hours of personnel time per server. The cost effectiveness of Rick’s innovative design not only made Fetch’s products and services more profitable, it made them possible. The expenditure that would have been required for the hundreds of physical servers, networking, and data center costs to enable the additional capacity required could have exceeded the potential revenue.

A Protostar
Rick has been a vital asset to the Fetch team and an incredible partner to the Platform Computing team over the years. We congratulate him on this latest recognition and wish him the best of luck in his newest adventure – in search of more disruption – in his new position as Cloud Architect at Activision. We look forward to working with Rick on his next private cloud adventure. Congratulations, Protostar!

One small step for man, one enormous leap for science

News from CERN last week that E may not, in fact, equal mc2 was earth shattering. As the news broke, physicists everywhere quivered in the knowledge that everything they once thought true may no longer hold and the debate that continues to encircle the announcement this week is fascinating. Commentary ranges from those excited by the ongoing uncertainties of the modern world to those who are adamant mistakes have been made.

This comment from Matt Alden-Farrow on the BBC sums up the situation nicely:

“This discovery could one day change our understanding of the universe and the way in which things work. Doesn’t mean previous scientists were wrong; all science is built on the foundation of others work.”

From our perspective, this comment not only sums up the debate, but also the reality of the situation. Scientific discoveries are always built on the findings of those that went before and the ability to advance knowledge often depends on the tools available.

Isaac Newton developed his theory of gravity when an Apple fell on his head – the sophisticated technology we rely on today just didn’t exist. His ‘technology’ was logic. Einstein used chemicals and mathematical formulae which had been discovered and proven. CERN used the large hadron collider and high performance computing.

The reality is that scientific knowledge is built in baby steps, and the time these take is often determined by the time it takes for the available technology to catch up with the existing level of knowledge. If we had never known Einstein’s theory of relativity, who’s to say that CERN would have even attempted to measure the speed of particle movement?

IDC HPC User Forum – San Diego

Platform just returned from attending the IDC HPC User Forum held from Sept. 6-8 in San Diego.

As opposed to previous years, this year’s event seemed to have drawn fewer people from the second and third tiersof the HPC industry. Overall attendance for this event also appeared to be about half of that compared to the event in April in Texas.

This time the IDC HPC User Forum was dominated by a focus on software and the need for recasting programming models,. There was also a renewed focus on getting ISVs and open source development teams to adopt programming models that can scale far beyond the limits they currently have. Two factors are driving this emphasis
  • Extremely parallel internals for compute nodes (from both a multi-core and an accelerator [CUDA, Intel, AMD] points of view).
  • The focus on “exa” scale, which by all counts will be achieved by putting together ever increasing numbers of commodity servers

Typically there is a theme to the presentations for the multi-day event, and this forum was no different. Industry presentations were very focused on the material science being performed primarily by the US national labs and future applications of the results being obtained. The product horizon on the technologies presented was estimated at approximately 10 years.

In contrast to the rest of the industry which is very cloud focused right now, cloud computing was presented or mentioned only three times by various vendors and also mentioned by by the Lawrence Berkeley National Laboratory (LBNL) at the forum. When it comes to cloud, there seems to be a split between what the vendors are focusing on and what the attendees believe. Specifically, attendees from national laboratories tend to be focused on “capability” computing (e.g. large massively parallel jobs running on thousands of processors). Jeff Broughton from LBNL presented some data from a paper that showed how, for the most part, cloud computing instances are not ready for the challenge of doing HPC.

Though we can’t refute any of the data or claims made by Mr. Broughton, the conclusions that may be drawn from his data might extend beyond what the facts support. For instance, in our experience here at Platform, we’ve found that most HPC requirements in industry do not span more than 128 cores in a single parallel job nor do they require more than 256 GB of memory. The requirements for most companies doing HPC are significantly more modest and are therefore much more viable to be addressed by a cloud computing model.

We at Platform have long been fighting the “all or nothing” notion of HPC employing cloud technology. Rather, we believe that industry – especially in Tiers 2 and 3, to a lesser or greater degree, will be able to make extremely beneficial use of cloud computing to address their more modest HPC requirements. Platform is focused on developing products to help these customers easily realize this benefit. Stay tuned for more on Platform’s cloud family of products for HPC—there will be more on this in the coming months…

Blog Series – Five Challenges for Hadoop MapReduce in the Enterprise, Part 3

Challenge #3: Lack of Application Deployment Support

In my previous blog, I explored the shortcomings in resource management capabilities in the current open source Hadoop MapReduce runtime implementation. In this installment of the “Five Challenges for Hadoop MapReduce in the Enterprise” series, I’d like to take a different view on the existing open source implementation and examine the weaknesses in its application deployment capabilities. This is critically important because, at the end of day, it is the applications that a runtime engine needs to drive, without a sufficient support mechanism, a runtime engine will only have limited use.
To better illustrate the shortcomings in the current Hadoop implementation for its application support, we use below diagram to demonstrate how the current solution handles workloads.

As shown in the diagram, the current Hadoop implementation does not provide multiple workload support. Each cluster is dedicated to a single MapReduce application so if a user has multiple applications, s/he has to run them in serial on that same resource or buy another cluster for the additional application. This single-purpose resource implementation creates inefficiency, a siloed IT environment and management complexity (IT ends up managing multiple resources separately).

Our enterprise customers have told us they require  a runtime platform designed to support mixed workloads running across all resources simultaneously so that multiple lines of business can be served. Customers also need support for workloads that may have different characteristics or  are  written in different programming languages. For instance, some of those applications could be data intensive such as MapReduce applications written in Java, some could be CPU intensive such as Monte Carlo simulations which are often written in C++ -- a runtime engine must be designed to support both simultaneously.  In addition, the workload scheduling engine in this runtime has to be able to handle many levels of fair share scheduling priorities and also be capable of handling exceptions such as preemptive scheduling. It needs to be smart enough to detect resource utilization levels so it can reclaim functionalities when the resources are available.  Finally, a runtime platform needs to be application agnostic so that developers do not have to make code changes or recompile to adapt the runtime engine supporting their applications. The architecture design of the current Hadoop implementation simply does not provide those enterprise-class features required in a true production environment.   

VMworld Signage - "Are You Cloud Intense?"

VMworld always attracts some interesting booth strategies and signage.  We were highlighted as being one of the most innovative with our "Are you Cloud Intense?" theme on our shirts and booth imagery.

Show Your Cloud Intensity

I'm not sure we are philosophers, but we definitely know how to help customers build private clouds with IaaS management software!

The Release of Platform HPC 3.0.1 Dell Edition

Last week, Platform HPC 3.0.1 Dell Edition was released. One of the significant features of this release is the support of high availability for management nodes on the latest version of Red Hat Enterprise Linux 6.1. Why is support for management node HA special when many cluster management tools support it already?

Well, Platform HPC is not just a cluster management solution. It is an end-to-end cluster management and productivity tool. When handling management node failover, Platform HPC not only needs to ensure all the cluster management functionalities can fail over, it also has to ensure other functionalities fail over at the same time so that the end user and administrator won’t see the difference before and after the failover. The functionalities that failover handles include but are not limited to:

  1. Provisioning
  2. Monitoring & alerting
  3. Workload management service
  4. Web portal for users and administrators

In a heavy production environment, the failover function of the user web portal is far more critical than the failover of the cluster management functionality. This ensures users have non-stop access to the cluster through the web portal even if the primary management node running the web server is down.

Other capabilities included in Platform HPC 3.0.1 Dell Edition include:

  1. Dell hardware specific setup for management through IPMI
  2. One-To-Many BIOS configuration via the idrac-bios-tool
  3. Dell OpenManage integration
  4. Mellanox OFED 1.5.3-1 kit
  5. CUDA 4 kit
  6. Management software integration provided by QLogic and Terascale
  7. Dell fully automated factory install

With the complete and integration software components, the cluster solution that Platform Computing delivers together with Dell has gone a long way since the release of Open Cluster Stack in 2007.

Platform MapReduce v1.5 brings enhanced functionalities to MapReduce runtime

We are pleased to announce that Platform Computing 1.5 is now available.  Compared to its predecessor, Platform MapReduce v1.0 released in late June, the newly released version brings a number of enhancements in the key functionalities the product delivers.  Major improvements include the availability of the MapReduce application adapter technology, which allows users to execute their existing Hadoop applications without changing the code or recompile. Enhancements have also been made to the runtime layer so that mixed workloads can run on a same cluster simultaneously.  In addition, the support of IBM GPFS in the data layer is perhaps the most compelling capability in this release because it delivers a powerful solution to users running Hadoop applications on GPFS instead of a designated file system.

For more on Platform MapReduce 1.5, please see:

So what does it all boil down to? Well, there are a number of immediate benefits with the new version:  
  • The support of mixed workloads running on the same cluster simultaneously improves resource utilization and drives shared services model for IT, therefore multiple business lines can share the same infrastructure and a centralized IT management
  • Increased developer productivities and choices. With the application adapter technology offered in Platform MapReduce 1.5, developers can build their applications using their preferred MapReduce programming framework and run their code without making changes to the code or recompile, thus  accelerating  the application development cycle while eliminating vendor lock-in.
  • The integration of IBM GPFS and Platform MapReduce 1.5 allows users to run MapReduce applications directly on the data stored in GPFS instead of moving the data to a designated file system before the application execution, which can be a very costly operation.  In addition, the combined technologies deliver the best of both worlds to users running MapReduce applications.
  • The unique capability of supporting different data input from output in Platform MapReduce 1.5 provides a more efficient approach to the ETL function as it eliminates the requirements for data staging at output, which can be a time consuming and expensive operation.
The past couple of months have been busy and exciting for us at Platform. We are seeing increased interest in the key functionalities offered in Platform MapReduce 1.5.  As Hadoop / MapReduce continue to gain market traction, users will become more educated on these emerging technologies and many will begin to move their MapReduce applications from labs to true production environments.  We believe Platform MapReduce will play a critical role in this transition by delivering a reliable, efficient and proven solution to users running MapReduce in production. 

Best of VMworld Award for Private Cloud Management

One of the fun elements at VMworld is participating in the “Best of VMworld” awards.  Platform Computing entered the Private Cloud Management category against 15+ vendors.  After several web meetings, live demos and reference reviews, we were awarded the Finalist position.  

Best of VMworld 2011 Private Cloud

This award follows other recent successes such as being named the #1 private cloud vendor by Forrester Research.  Customers want a trusted vendor that has proven themselves in large-scale, enterprise environments, and one that has been around for a few years.  Companies just don't want to trust their data centers to a startup with an uncertain future.  There is just too much at-risk.   

Platform team celebrates the win at VMworld!

Our private cloud / IaaS management solution, called Platform ISF, was cited for the following key differentiators:

  • Cloud in a cloud in a cloud – ability to support multiple hierarchical organizational structures with nested virtual private clouds for business units, groups, projects, etc.
  • Physical deployment – includes native capability to do bare metal physical provisioning all the way to complex multi-tier application structures
  • Application-centric framework – IT services are captured as standardized applications or application components that are multi-tiered, virtual and physical, private and public cloud, etc.
  • Automated provisioning of VMware – deep integration that maintains investment in VMware with automatic discovery, import (all or partial), synchronization, and simplification of provisioning VMware environments
  • Multi-hypervisor with KVM and Xen – adds a management layer similar to vCenter on top of KVM / Xen to normalize capabilities across hypervisors
  • Integration with P2V (physical to virtual) – support for multiple provisioning tools plus a Platform native option to support the ongoing transition of more applications and workloads to virtual environments
  • Managing resource pools on DRS clusters – adding control over resource pools on DRS clusters with end user self-service management aligned to hierarchical account management
  • Policy management - depth and breadth of policy management capabilities for both initial placement and operational service level agreements (SLAs) such as flexing up and down
  • One solution – Platform ISF is an integrated set of IaaS management capabilities that span self-service, billing, automation, etc. versus requiring multiple solutions from different vendors that require stitching together
Which of these are important in your environment?

Top 5 VMworld 2011 Takeaways for Private Cloud

4 days.  115 degrees outside.  20,000 of my closest virtualization friends.  VMworld Las Vegas was quite an experience.  The Platform Computing booth was busy from start to finish with a demo of our award-winning Platform ISF private cloud (IaaS) management solution.  

From attending the keynotes and speaking with attendees, I had 5 key takeaways from the event:

1.       Private cloud is at a tipping point
Gone are the days of having to start every conversation by explaining the difference between virtualization and private cloud.  Of course this is a self-selected group but the education level has dramatically increased.  We had many people stopping by to discuss their private cloud initiatives and how to get started.

2.       VMware licensing has upset many
The new vSphere 5 licensing has caused many companies to reevaluate their dependency on VMware.  The recent price increases had disrupted their planning and created additional fears about being locked in to a costly hypervisor (vCloud = vLockIn).  VMware management tools manage VMware ESXi hypervisor--and that's it.  There is a real fear of being locked into a specific stack and/or being required to upgrade to more expensive packages with lots of unnecessary functionality.

3.       Multi-hypervisor is real
Even before the price increase, users have been evaluating a mixed hypervisor strategy.  A common discussion is VMware’s closed support at the management layers.  VMware is viewed as the gold standard when considering virtualizing production applications.  End users already have or are seriously considering adding KVM (Platform is a member of the Open Virtualization Alliance) and to a lesser extend Xen into their roadmaps for non-production use cases (for example dev and test).  These offer great density at lower cost.

4.       The rise of application-centric thinking
In the early days of private clouds, most of the focus was on the transient IT services such as an OS or VM.  Typical end users were developers needing access to standardized IT environments.  Most of our conversations included discussions about additional levels of an application stack – including the middleware, application logic, and post-provisioning scripting.  In summary, an application-centric private cloud.  By standardizing IT service definitions to include these levels and automating delivery, companies can effectively define their own PaaS.  While we position Platform ISF primarily as IaaS management, many of our customers think we are selling the solution short.  Customers are using it to define and provision their own PaaS and SaaS applications.  Ironically this was a major focus of VMware’s messaging, albeit in a hosted PaaS environment.  Most of our larger enterprise customers want to tightly integrate their on-premise data center infrastructure and security systems into an on-premise personalized PaaS. 

5.       Physical is part of cloud
When one thinks cloud, the natural mental model = virtualization.  However, I was surprised by the number of conversations where end users want to add integrated physical provisioning.  Sometimes the focus was for standalone bare metal provisioning.  But as we move more towards an application-framework, there are components in the app stack such as the database server that might only exist on a physical system for scalability and performance.  Operating within this mixed environment is a core requirement to deal with both non-production transient IT services as well as production apps with flexing and stringent service level agreements (SLA).

At Platform Computing, we are not cloud washing our products.  On top of being named the #1 vendor by Forrester, we received the Best of VMworld Finalist award for Private Cloud Management.

Were you there?  What were your takeaways?

HPC from A - Z (Part 26) - Z

Z is for… Zodiac
The wonders of the universe will remain of interest to the human race until the end of time or at least until we think we know everything about everything. Whichever comes first…

With the developments in technology of late, the amount of information we have about constellations and other universes is growing exponentially. For all we know, in fifty years we may be taking holidays in space or have discovered a form of life just like us in a far away galaxy.

But understanding what is really out there is far more complex than books and films make it seem. The pretty pictures of constellations don’t do astronomy justice! The amount detail needed to track star and planet movements and understand which direction constellations are moving in requires some seriously high resolution telescopes. Just think about the amount of ‘zoom’ required to detect traces of flowing liquid on Mars. This is well beyond the capabilities of your standard Canon or Nikon – that’s for sure!

With high resolution come high data volumes. So, like all the posts before this, HPC is crucial for cosmology, astrophysics, and high energy physics research. Without it, results could take years to find instead of months of minutes. By the time the path of the celestial sphere is mapped it could easily be into its second or even third cycle.

HPC can also be used in more theoretical contexts. For example, researches at Ohio State University required the compute power provided by the Ohio Supercomputing Center Glenn Cluster to run simulations and modeling required for their study on the effects of star formation and growth of black holes.

As we finally reach the end of our ABC series, there’s no denying the critical role that compute power plays in our day-to-day lives. Technology is developing at a startling pace, and with each and every new development comes more data and a consequent need to process and make sense of it. Without HPC our technological advancements would not be nearly as fast, and we as a society would not have the insight and capabilities that we do today.

Blog Series – Five Challenges for Hadoop MapReduce in the Enterprise, Part 2

Challenge #2: Current Hadoop MapReduce implementations lack flexibility and reliable resource management

As outlined in Part 1 of this series, here at Platform, we’ve identified five significant challenges that we believe are currently hindering Hadoop MapReduce adoption in enterprise environments.  The second challenge, addressed here, is a lack of flexibility and resource management provided by the open source solutions currently on the market.

Current Hadoop MapReduce implementations derived from open source are not equipped to address the dynamic resource allocation required by various applications.  They are also susceptible to single points of failure on HDFS NameNode, as well as on JobTracker. As mentioned in Part 1,  these shortcomings are due to the fundamental architectural design in the open source implementation in which the job tracker is not separated from the resource manager.  As IT continues its transformation from a cost center to a service-oriented organization, the need for an enterprise–class platform capable of providing services for multiple lines of business will rise.  In order to support MapReduce applications running in a robust production environment, a runtime engine offering dynamic resource management (such as borrowing and lending capabilities) is critical for helping IT deliver its services to multiple business units while meeting their service level agreements.  Dynamic resource allocation capabilities promise to not only yield extremely high resource utilization but also eliminate IT silos, therefore bringing tangible ROI to enterprise IT data centers.

Equally important is high reliability. An enterprise-class MapReduce implementation must be highly reliable so there are no single points of failure. Some may argue that the existing solution in Hadoop MapReduce has shown very low rates of failure and therefore reliability is not of high importance.  However, our experience and long history of working with enterprise-class customers has proved that in mission critical environments, the cost of one failure is measured in millions of dollars and is in no way justifiable for the organization. Eliminating single points of failure could significantly minimize the downtime risk for IT. For many organizations, that translates to faster time to results and higher profits.