HPC Cloud and Analytics
Big congratulations to the team for this significant achievement!
Without a proper tool or a lot of practice, getting Linux and Windows to work together seamlessly to provide a unified interface for end users is a very challenging task. Having both systems coexist in an HPC cluster environment adds an order of magnitude of additional complexity compared to an already complex enough HPC Linux cluster.
This is because Windows and Linux “speak very different languages” in many areas such as user account management, file path and directory structure, cluster management practice, application integrations etc.
The good news is the Platform Computing engineering team did some heavy lifting in product development for this project. Platform HPC integrates with the full software stack required to run an HPC Linux cluster. Its major differentiator compared to alternative solutions is that Platform HPC is application aware. When adding Windows HPC Server into the HPC cluster, the solution delivered by Platform HPC ensures it provides a unified user experiences across Linux and Windows, and hides the difference and complexity between the two OSs.
Platform HPC team has developed a step by step guide for implementing an end-to-end solution with provisioning both Windows and Linux, unified user authentication, unified job scheduling, automated workload driven OS switch, application integrations, and unified end-user interfaces.
This solution significantly reduces the complexity of a mixed Windows and Linux cluster, so users can focus on their applications and their productive work, as opposed to managing the complexity of the mixed Windows and Linux cluster.
The European conference concerning the intersection between cloud computing and HPC has just finished, and it's very pleasing to report that this conference delivered considerable helpings of useful and exciting information on the topic.
Though cloud computing and HPC have tended to stay separated, the HPC community starting with the sc2010 conference, interest has been gaining primarily because access to additional temporary resource is very temping. However, other reasons for HPC users and architects to evaluate the viability of cloud include total cost of ownership comparisons, and startup businesses which may need temporary access to HPC but do not have the capital to purchase dedicated infrastructure.
Conclusions varied from presenter to presenter, tough some things were generally agreed upon:
Another interesting trend was the different points of view depending on where the presenter came from. Generally, researchers from national labs had the point of view that cloud computing was not comparable to their in-house supercomputers and was not a viable alternative for them. Also, compared to the scale of their in-house systems, the resources available from Amazon or others were seen as quite limited.
Conversely, presenters from industry had the opposite point of view (notably a presentation given by Guillaume Alleon from EADS). Their much more modest requirements seemed to map much better into the available cloud infrastructure and the conclusion was positive for cloud being a capable alternative to in-house HPC.
Perhaps this is another aspect of the disparity between capability and capacity HPC computing. One maps well into the cloud, the other doesn't.
Overall it was a very useful two days. My only regret was not being able to present Platform's view on HPC cloud. See my next blog for some technologies to keep an eye on for overcoming cloud adoption barriers. Also, if anyone is interested in HPC and the cloud, this was the best and richest content event I've ever attended. Highly recommended.
Well, Platform HPC is not just a cluster management solution. It is an end-to-end cluster management and productivity tool. When handling management node failover, Platform HPC not only needs to ensure all the cluster management functionalities can fail over, it also has to ensure other functionalities fail over at the same time so that the end user and administrator won’t see the difference before and after the failover. The functionalities that failover handles include but are not limited to:
Other capabilities included in Platform HPC 3.0.1 Dell Edition include:
During the period of system installation and configuration, a number of areas demonstrated the advantages of partnering with Platform Computing:
(1) Management software: Platform HPC was chosen to manage the system. The scalability and maturity of the software components simplified the installation and the configuration of the management software layer. Both the workload scheduler (based on Platform LSF) and MPI library (Platform MPI) on the system scale effortlessly.
(2) MPI expertise: To achieve maximum Linpack performance results, it is critical to ensure MPI performance is optimized. During the installation and configuration stage, the Platform MPI development team provided numerous best practices to help maximize the benchmarking results, from checking cluster healthiness to MPI performance tuning. They collaborated closely with developers from QLogic, who provided Infiniband interconnects.
(3) Dynamic zoning: The system will be used by multiple research user groups. There is a separate workload management instance for each user group. Based on the workload of each user group, the size of the workload management zone will change from time to time. Each zone has its own user account management system and scheduling policies. Platform HPC was set up to easily manages such dynamic configuration changes.
The maturity of Platform HPC, as well as the expertise from Platform Computing’s development and services teams played a key role in ensuring the success of this Acer project. The maximized performance and stability of the benchmarking runs enabled the results to be submitted in time for the June TOP500 list. But mostly importantly, when the system is in hands of hundreds of users in production, the robustness of the workload management, the performance of MPI, as well as the support from experts who built the software will make a difference in delivering the quality of services from this top Taiwanese supercomputer.