IBM’s Systems and Technology Group wants to remain a systems business, not a hardware business nor a parts business. Therefore, IBM’s renewed emphasis is on systems software in its own right. IBM and Platform, the two complementary market leaders in technical computing systems and management software respectively, are coming together to provide overall market leadership and help customers to do more cost effective computing. In IBM speak, it’s smarter systems for smarter computing enabling a Smarter Planet. Not smarter people. Just normal people doing smarter things supported by smarter systems.
Why Combine Platform Computing with IBM?
IBM’s Systems and Technology Group wants to remain a systems business, not a hardware business nor a parts business. Therefore, IBM’s renewed emphasis is on systems software in its own right. IBM and Platform, the two complementary market leaders in technical computing systems and management software respectively, are coming together to provide overall market leadership and help customers to do more cost effective computing. In IBM speak, it’s smarter systems for smarter computing enabling a Smarter Planet. Not smarter people. Just normal people doing smarter things supported by smarter systems.
ISC cloud 2011
The European conference concerning the intersection between cloud computing and HPC has just finished, and it's very pleasing to report that this conference delivered considerable helpings of useful and exciting information on the topic.
Though cloud computing and HPC have tended to stay separated, the HPC community starting with the sc2010 conference, interest has been gaining primarily because access to additional temporary resource is very temping. However, other reasons for HPC users and architects to evaluate the viability of cloud include total cost of ownership comparisons, and startup businesses which may need temporary access to HPC but do not have the capital to purchase dedicated infrastructure.
Conclusions varied from presenter to presenter, tough some things were generally agreed upon:
- if using Amazon EC2, HPC applications must use the cluster compute instance to achieve comparable performance to local clusters.
- fine grained MPI applications are not well suited to the cloud simply because none of the major vendors offer infiniband or other low latency interconnect on the back end
- running long term in the cloud, even with favorable pricing agreements is much more expensive than running in local data centers, as long as those data centers already exist. (no one presented a cost analysis which included the datacenter build costs as an amortized cost of doing HPC.)
Another interesting trend was the different points of view depending on where the presenter came from. Generally, researchers from national labs had the point of view that cloud computing was not comparable to their in-house supercomputers and was not a viable alternative for them. Also, compared to the scale of their in-house systems, the resources available from Amazon or others were seen as quite limited.
Conversely, presenters from industry had the opposite point of view (notably a presentation given by Guillaume Alleon from EADS). Their much more modest requirements seemed to map much better into the available cloud infrastructure and the conclusion was positive for cloud being a capable alternative to in-house HPC.
Perhaps this is another aspect of the disparity between capability and capacity HPC computing. One maps well into the cloud, the other doesn't.
Overall it was a very useful two days. My only regret was not being able to present Platform's view on HPC cloud. See my next blog for some technologies to keep an eye on for overcoming cloud adoption barriers. Also, if anyone is interested in HPC and the cloud, this was the best and richest content event I've ever attended. Highly recommended.
One small step for man, one enormous leap for science
This comment from Matt Alden-Farrow on the BBC sums up the situation nicely:
“This discovery could one day change our understanding of the universe and the way in which things work. Doesn’t mean previous scientists were wrong; all science is built on the foundation of others work.”
From our perspective, this comment not only sums up the debate, but also the reality of the situation. Scientific discoveries are always built on the findings of those that went before and the ability to advance knowledge often depends on the tools available.
Isaac Newton developed his theory of gravity when an Apple fell on his head – the sophisticated technology we rely on today just didn’t exist. His ‘technology’ was logic. Einstein used chemicals and mathematical formulae which had been discovered and proven. CERN used the large hadron collider and high performance computing.
The reality is that scientific knowledge is built in baby steps, and the time these take is often determined by the time it takes for the available technology to catch up with the existing level of knowledge. If we had never known Einstein’s theory of relativity, who’s to say that CERN would have even attempted to measure the speed of particle movement?
The Economics of Cloud, Part I - IDC HPC User Forum
After attending IDC’s HPC User Forum in Houston last month and participating in an HPC cloud panel, it's clear that many potential cloud users still seem confused about the economics of cloud and when it's beneficial. One of the complaints we heard many times was about Amazon’s pricing model being several factors (nearly three times) more expensive than outright hardware purchases. While true, users who complain about this fact may be, at least partially, missing the primary use case for external cloud computing.
Our cloud panel didn’t have enough time to delineate the conditions and workload where cloud computing offers economic advantages, so it seems appropriate to start that discussion here in the first of a series of the Economics of Cloud.
There several factors that should contribute to doing an HPC cloud computing pilot and most are necessary, but not required conditions. These include:
· Practical data sizes input and output or post processing methods that can be used to post process without data transfer
· Serial or course grained parallel workload
· Data security policies that can be satisfied by the cloud
· Application OS and performance requirements that lead to acceptable performance in the cloud
· Unsteady workload requirements (meaning the amount of resource a workload requires varies over time)
This last factor is the one that might be the most confusing. Using IaaS can be very cost effective if the results from a workload are highly valuable and short lived. Conversely, results of unknown value and lengthy execution durations or large data requirements can have enormous charges associated with them.
One simple way of visualizing this is to understand the peak workload (expressed as a fraction of the available local resource) and the average workload. The difference between these two values, if significant, is a good indicator for whether cloud computing could have positive ROI or not. If this effect is plotted in time and the average and peak lines are overlaid, the term "peak shaving" is clearly an apt description of what benefit cloud computing can offer.
Invariably, a steady workload is most efficiently processed in a local data enter resource when compared to pay-per-use rates. Indeed, most IaaS providers have calculated a factor between two and three times over into their pricing for hardware costs to account for the opportunity value of near instantaneous access to compute resources. Thus, paying this "tax" for a steady workload could have disastrous financial consequences if adopted as a strategy.
Anyone interested in permanent or long term cloud resource access should probably investigate longer term service contracts with a selected IaaS provider if local resources are not an option. Such an alternative agreement could easily change any potential negative financial estimates for the benefits of cloud.
HPC from A-Z (part 8) - H
High Energy Physics – When it comes to high energy physics, there is only one name in the frame: CERN. The European Organization for Nuclear Research is using HPC for the Large Hadron Collider. The Large Hadron Collider is a 100 meter long particle accelerator designed to improve our understandings of atoms and the universe. This is quite possibly the most fascinating and imaginative use of HPC. Can you think of one that trumps it?
CERN depends on computing power to ensure that 17,000+ scientists and researchers in 270 research centers in 48 countries can collaborate on a global scale to solve the mysteries of matter and the universe. An HPC environment is central to this, and enables scientist and researchers to quickly analyze the data.
Not to burst your bubble…
Probably everyone who has ever been in an HPC environment as a user has run into resource constraints. Limitations like hardware, licenses, memory, or maybe even GPUs on hybrid compute engines rank at the high end of the limitations most users face are facing today. The problem comes when you run into these limitations in the middle of running a job. What do you do then? “Hitting the stops,” so to speak, will often cause another procurement cycle and consume lots of resources in terms of analysis, internal meetings, planning, as well as the purchase and deployment phases before any additional real work can get completed.
Cloud computing, or in this case, cloud bursting offers an approach to mitigate the process and limitations that most HPC consumer corporations go through today. Certainly, using resources outside the firewall can have its own challenges for corporate users, but those aren’t the focus of this blog.
Assume for a moment that security, licensing, provisioning latency and data access are not a problem. Of course, they’re all major issues that need to be addressed to make a cloud solution usable, but bear with me. There are still some important questions that need to be answered:
- What are the appropriate conditions to start up and provision cloud infrastructure?
- What jobs should be sent to the cloud once that infrastructure is provisioned and which should stay local and wait?
This second question is the focus here. Often in science, the hardest part of solving a tough problem is stating the question properly. In this case, the question is nicely represented by the below inequality where each entity represents a factor of elapsed time. For cloud computing to be advantageous from a performance perspective,
(Data upload to cloud) + (Cloud Pend time) + (Cloud Run time) + (Data download from cloud) < (Local Pend time) + (Local Run time)
Such a statement allows us to draw a few conclusions about the conditions for when cloud bursting is advantageous for the HPC user:
- When local job pend time estimates for a job get very large
- When local elapsed run time is large -- A corollary to this condition is that if the job can be parallelized, but there are insufficient resources locally to run the job quickly, then cloud bursting the job may return results to the user sooner than allowing the job to run on insufficient resources locally.
- When the job’s data transfer requirements into and out of the cloud are small
Additionally to those conditions, we start to see where several of the real challenges are for a scheduler to make the right decision about which jobs get sent to the cloud and which don’t. For instance, most schedulers today do not consider the data volume associated with a job. But, in a cloud scenario, the data transfer times associated could be 2-50x the runtime for a job and not only dependent upon the file size, but the available transfer bandwidth. Schedulers will need to evolve on several levels to tackle this challenge:
- Allow users to indicate the files (both input and output) required for each job.
- Estimate pending and run time for disparate infrastructures
- Estimate the run time for jobs which run parallel
HPC from A-Z (part 2) - B
Biology - One of our customers, The Sanger Institute, is a genome research institute that is primarily funded by the Wellcome Trust. It has participated in some of the most important advances in genomic research; developing new understanding of genomes and their role in biology. That type of research requires a great deal of computational power so scientists can perform large scale analysis, such as quickly comparing similar genomic structures.
For more information on how Sanger benefits from an HPC environment please have a look at our video.
HPC is helping biology researchers find out what we are made of. Next week we’ll look at how HPC is helping companies develop and design better consumer products.
Straw Poll Shows HPC Looking to the Cloud
We spoke to 100 delegates at the conference from a number of disciplines—research, government, education, manufacturing to name a few—and almost two-thirds of them have already been experimenting with public and private cloud environments within their organizations. What’s more—they’re generally happy with their cloud experiences thus far—and many of those who have not done cloud trials are planning to do so within the next 12 months.
What a far cry from last year. In our 2009 survey, users were only “considering” establishing a private cloud, let alone actually starting a cloud initiative. What’s driving this move toward experimenting with both public and private clouds? The ability to offload applications and workloads to public cloud providers (23 percent) and to burst workloads (15 percent) is according to our survey. HPC users have been generally skeptical about the hype around cloud but eager to reduce costs as they require more performance and scale. Our recent webinar on "when offloading to the cloud works and doesn't" along with the proper cloud strategy, are some considerations for organizations considering HPC in the cloud.
If you’re interested in learning more, check out our whitepaper on HPC cloud scenarios—we’ve got customers running some really fascinating HPC solutions in the cloud right now.
Storm is brewing – HPC Cloud
At the 2009 Supercomputing conference in Portland last year, Platform Computing showed off our our first generation cloud computing management tool, Platform ISF. At that time, the “cloud” buzzword was still fairly new to the HPC community, and had several stern critics in the HPC space. To many HPC folk, “cloud” meant virtualization, and virtualization meant low performance. Very few other vendors at the conference last year even used the C-word, and when they did it was to describe other types (e.g. enterprise computing, dynamic datacenter provisioning, etc) of computing. So for a while, many believed as we did, that virtualization takes the “H” out of “HPC.”
In contrast to last year’s conference, this year both software vendors and infrastructure vendors were present talking about making cloud adoption easy, with every hardware vendor trying to persuade potential customers that building a private cloud using their hardware was a smart choice – especially when the vendor offers their own IaaS model for workload overflow (Platform calls that “Cloud Bursting”).
Also in contrast to 2009, this year has shown hypervisors and processors alike have matured to better support near hardware performance with virtualization. Indeed the performance chasm for some applications has narrowed to a crack (For more on this see our Platform whitepaper). Also, perceptions of the cloud have started to change in the HPC community. For the correct jobs, virtualization doesn’t have to mean there’s an unacceptable performance burden, and the advantages it brings to management, not to mention flexibility, are hard to ignore.
This year at SC’2010 we gave almost the same demonstration with more polish. The difference was the reaction had turned from disdain and skepticism to curiosity and interest. Yes, there are still several issues that need to be sorted before cloud computing is simple for HPC (licensing, data movement, and data security are the biggies). Nevertheless, HPC users are finally beginning to think about the cloud and performance is becoming less and less of an issue. Amazon, for instance, let their HPC performance data walk and talk for itself at the show. A cluster using HPC on EC2 placed at 230th in the TOP500(see http://www.top500.org/system/10661). So there’s no debating it--you can do HPC in the external public cloud – at least if you’re running Linpack.
Even if your application may be difficult to adapt to the cloud, the barriers are falling one by one. So taking the longer term view, in the next 5 years, HPC in the cloud doesn’t seem only feasible, it seems--as we a Platform Computing believe--inevitable.