Well, Platform HPC is not just a cluster management solution. It is an end-to-end cluster management and productivity tool. When handling management node failover, Platform HPC not only needs to ensure all the cluster management functionalities can fail over, it also has to ensure other functionalities fail over at the same time so that the end user and administrator won’t see the difference before and after the failover. The functionalities that failover handles include but are not limited to:
- Monitoring & alerting
- Workload management service
- Web portal for users and administrators
In a heavy production environment, the failover function of the user web portal is far more critical than the failover of the cluster management functionality. This ensures users have non-stop access to the cluster through the web portal even if the primary management node running the web server is down.
Other capabilities included in Platform HPC 3.0.1 Dell Edition include:
- Dell hardware specific setup for management through IPMI
- One-To-Many BIOS configuration via the idrac-bios-tool
- Dell OpenManage integration
- Mellanox OFED 1.5.3-1 kit
- CUDA 4 kit
- Management software integration provided by QLogic and Terascale
- Dell fully automated factory install
With the complete and integration software components, the cluster solution that Platform Computing delivers together with Dell has gone a long way since the release of Open Cluster Stack in 2007.