And the winner is... Jaguar Land Rover for business IT project of the year!

Held at a glittering event in London, Platform customer Jaguar Land Rover (JLR) has been awarded the prestigious business project of the year award at the annual UK IT Industry Awards hosted by Computing Magazine in the UK.

Platform Computing worked with Jaguar Land Rover to create an advanced IT environment to underpin the organisation’s virtual car product development, complying with strict safety and environment regulations. JLR has deployed a state-of-the-art system consisting of scalable compute clusters, engineering workstations all built from commodity technologies. The project was recognised for its complexity, ability to reduce the time to market, engineering costs and environmental impact of product development for JLR.

Big congratulations to the team for this significant achievement!

Big Data report from SC’11

In my previous blog I expressed high expectations for the Big Data-related activities at this year’s Supercomputing conference. Coming back from the show, I’d say the enthusiasm and knowledge on Big Data within the HPC community truly surprised me. Here are the major highlights from the show:

  • Good flow of traffic at the Platform booth for Platform MapReduce. Many visitors stopped by our booth to learn more about Platform MapReduce – a distributed runtime engine for MapReduce applications. I found it easy to talk to the HPC crowd because many folks in this  community are already familiar with Platform LSF and Platform Symphony; both are flagship products from Platform that have been deployed and tested in large-scale distributed computing environment for many years. Since Platform MapReduce is built on similar core technology as what’s in those mature products, the HPC community quickly understood the key features and functions it brings to Big Data environments. Even though many users are still at  early stage of either developing MapReduce applications or looking into new programming models, they understand that a sophisticated workload scheduling engine and resource management tool will become critically important once they are ready to deploy their applications into production. Many HPC sites were also interested in exploring the potential of leveraging their existing infrastructure for processing data-intensive applications. For instance, questions on how a MPI and MapRedcue jobs can coexist on the same cluster were frequently asked at the show. The good news, Platform MapReduce is the only solution that can provide capability of supporting mixed workloads.
  •  “Hadoop for Adults” -- This was a quote from one of the attendees after sitting through our breakfast briefing on overcoming MapReduce barriers. We LOVE it! The briefing lured over 130 people and well exceeded our expectations! Our presentation on how to overcome the major challenges in current Hadoop MapReduce implementations drew great interest. “Hadoop for Adults” sums up the distinct benefits Platform MapReduce brings. Platform Computing knows how to manage large-scale distributed computing environments. Bringing that same technology into Big Data environments is a natural extension for us. The reaction at SC’11 for Platform MapReduce was encouraging and a validation on our expertise in scheduling and managing workloads and overall infrastructure in a center.
  • Growing momentum on application development. As sophisticated as always, the HPC community is at the forefront of developing applications to solve data-intensive problems across various industries and disciplines: cyber security, bioinformatics, electronic industry and financial services are just a few examples. Many Big Data related projects are being funded at HPC data centers and we are expecting a proliferation of applications coming out of those projects very soon.

The show is officially over but the excitement around Big Data will continue. For me, not only have I gained tremendous insights on the Big Data momentum in HPC, but I’m also pleased to see the overwhelming reaction for Platform MapReduce within the HPC community. Nothing beats pitching the right product to the right audience!  

Linux Interacting with Windows HPC Server

There are many interesting technology showcases at SC’11 this week. One of the Birds of a Feather sessions this week will discuss a solution implemented at Virginia Tech Transportation Institute (VTTI) that mixes Linux with Windows HPC Server in a cluster for processing large amount of data.


Without a proper tool or a lot of practice, getting Linux and Windows to work together seamlessly to provide a unified interface for end users is a very challenging task. Having both systems coexist in an HPC cluster environment adds an order of magnitude of additional complexity compared to an already complex enough HPC Linux cluster.


This is because Windows and Linux “speak very different languages” in many areas such as user account management, file path and directory structure, cluster management practice, application integrations etc.


The good news is the Platform Computing engineering team did some heavy lifting in product development for this project. Platform HPC integrates with the full software stack required to run an HPC Linux cluster. Its major differentiator compared to alternative solutions is that Platform HPC is application aware. When adding Windows HPC Server into the HPC cluster, the solution delivered by Platform HPC ensures it provides a unified user experiences across Linux and Windows, and hides the difference and complexity between the two OSs.


Platform HPC team has developed a step by step guide for implementing an end-to-end solution with provisioning both Windows and Linux, unified user authentication, unified job scheduling, automated workload driven OS switch, application integrations, and unified end-user interfaces.



This solution significantly reduces the complexity of a mixed Windows and Linux cluster, so users can focus on their applications and their productive work, as opposed to managing the complexity of the mixed Windows and Linux cluster.

Wading throughthe Pre-SC’11 HPC News

The pre-SC11 dawn has already started to heat up with announcements being made by new vendors who see the promise of cloud computing aiding the needs of HPC users everywhere.


Just recently, a competitor of Platform Computing announced their entry into the “cloud bursting” space . They claim that their technology functions with all of the common workload management systems. Without much detail, the only conclusion that can be drawn from the announcement is that they have built a system to “poll – act – sleep” in a loop. While possibly illustrative of the promise of cloud computing, this simplistic view ignores what we believe is the most fundamental challenge to cloud bursting – data locality.

By “data locality” we refer to the fact that compute resources must have some access to data for them to be useful. When datasets get large (input or output), getting access to globally distributed compute resources may have dubious value until workload schedulers understand what data exists where and can affect transfers of data between localities to take better advantage of those remote resources.


Separately, we are very pleased to see others in the HPC industry focusing on the ease of use / ease of build idea. In fact just recently, Rightscale made an announcement to this effect for building clusters in the cloud Platform has been talking about the importance of this for some time, namely with our Platform HPC product. Stay tuned for more announcements as we take this same story into the cloud.


Finally, we were also very happy to see that HPC as a service continues to gain momentum. Just recently there was a story from the Netherlands where a small cluster of very “thick” servers running KVM have been used to create a serviced based HPC infrastructure to be rented to various researchers in the academic and government communities.


No doubt these announcements will be just the beginning of a coming onslaught from many vendors waiting to announce during the week. We expect that cloud and HPC will take one step closer together than they were last year in New Orleans. Stay tuned to the Platform blogs, we’ll provide a summary of the show and any key items we hear about.

Big Data’s Big Show at SC’11

It’s less than a week away, everyone in the HPC community are drumming up for SC’11. As someone who has been to SC for the past seven years, I was pleased to see that Big Data appears to be the  new kid on the block this year. Roughly 20 sessions will be dedicated to Big Data related topics at this year’s Supercomputing show. From basic training on Hadoop and MapReduce to the challenges and opportunities for exascale data analytics, we will hear wide range of discussions on  Big Data.  Platform Computing will also be hosting a breakfast briefing (free!) on how to overcome your MapReduce barriers in the morning of Wed., Nov 16 at the show. Registration details can be found here.

Traditionally, hot topics in HPC are often around performance, scaling, latency and bandwidth. It’s only been in the last couple of years that data intensive computing has become an area of interest in HPC, and it is heating up quickly! The reality is, now that hardware is getting faster and cheaper - thanks mainly to the advancement in processor technologies -- users can run more problems faster. As a result, more data is being generated and a lot of that data contains tremendous insights we could utilize to make better products and decisions. Sure, Big Data exists in Web 2.0, retail, telco companies as well as many other verticals in the enterprise space, yet there is no shortage of use cases in HPC. Areas such as cybersecurity, fraud detection, next-gen sequencing analysis are just a few such examples that fall into HPC arena – often applications in these space are both computationally and data intensive. 

HPC has long been considered the incubator for many bleeding edge innovations that will later trickle down and benefit mainstream applications. We believe Big Data is no exception. The annual Supercomputing show is always about showcasing leading edge science and technologies, Big Data certainly fits the bill.  I am looking forward to learning about new solutions for Big Data problems developed in HPC, as well as getting a better understanding of the specific requirements for this particular market. It will be an exciting week ahead, and we expect to hear some big buzz around Big Data at SC’11!

 

Stop by and visit Platform Computing at SC11 in booth #1117!

Blog Series – Five Challenges for Hadoop MapReduce in the Enterprise, Part 5


Challenge #5: Lack of Multiple Data Source Support

With this blog entry, we have reached the final chapter of the Hadoop challenge series. In this blog, I am going to discuss the fifth challenge for current Hadoop implementations – the lack of multiple data source support.
The current Hadoop implementation does not support multiple data sources; instead, it supports only one distributed file system at a time, the default being the Hadoop Distributed File System (HDFS). This restriction creates barriers for organizations whose data is stored in different file systems other than HDFS while implementing MapReduce applications in Hadoop.  For non-HDFS users, enabling MapReduce applications running in Hadoop environment means they have to first move their data from the current file system into HDFS, which can turn into a timing consuming and very expensive operation.  This limited capability for heterogeneous data support also leads to inferior performance and poor resource utilization due to an inflexible infrastructure.

The reality is, users want a platform that 1) supports various types of input data at the same time and outputs to their desired data sources (which could be different from the input type); 2) completes the   data conversion at runtime so no extra ETL steps are needed after the run.  (See the chart below for a high-level architecture layout for heterogeneous data support.) For instance, a user can send his input data stored in HDFS and output to a relational database such as Oracle and MySQL upon the completion of the run.  Such capabilities eliminate the data movement at both the beginning and the final stages of a MapReduce run, therefore, dramatically reducing the cost while improving the operation efficiency and driving   faster time to results. 

A high level diagram on heterogeneous data support

This capability of  heterogeneous data support in runtime can be considered as an alternative approach to traditional ETL function. The advantage of the former, while compared with the existing ETL tools, is that it provides a faster, cheaper and integrated new platform for users running Big Data applications.   

Having identified the 5 major challenges in the current Hadoop MapReduce implementation, we at Platform Computing has developed a solution – Platform MapReduce, an enterprise-class distributed runtime engine to not only address those barriers mentioned in this blog series, but also bring additional capabilities requested by users wanting to move their Big Data applications into production.  For detailed features and benefits delivered by Platform MapReduce, please visit: http://www.platform.com/mapreduce


Please join us at SC11 for a free breakfast briefing: “Overcoming Your MapReduce Barriers”.  Register today to secure your spot!