Hadoop MapReduce low latency matters! BGI Shenzen wins IDC Innovation Award


Last week IDC released the winners for the HPC Innovation Excellence Awards.

As per IDC’s announcement, BGI Shenzen saved millions of dollars while enabling faster processing of their large genome data sets. The IBM Platform Symphony product was used for the Hadoop MapReduce applications. Platform Symphony provides a low latency Hadoop MapReduce implementation. In conjunction with its unique shared memory management and the data aware low latency scheduler, it accelerates many life sciences applications by as much as 10 times over open source Apache Hadoop

Below is an excerpt from the release:

BGI Shenzhen (China). BGI has developed a set of distributed computing applications using a MapReduce framework to process large genome data sets on clusters. By applying advanced software technologies including HDFS, Lustre, GlusterFS, and the Platform Symphony MapReduce framework, the institute has saved more than $20 million to date. For some application workloads, BGI achieved a significant improvement in processing capabilities while enabling the reuse of storage, resulting in reduced infrastructure costs while delivering results in less than 10 minutes, versus the prior time of 2.5 hours. Some of the applications enabled through the MapReduce framework included: sequencing of 1% of the Human Genome for the International Human Genome Project; contributing 10% to the International Human HapMap Project; conducting research in combating SARS, and a German variant of the E. coli virus; and completely sequencing the rice genome, the silkworm potato genome, and the human gut metagenome. Project leader: Lin Fang

Congratulations to the BGI team for their well deserved recognition.

Rohit Valia
Director, Marketing
HPC Cloud and Analytics

The win in Monaco and a hot season continues



Mark Webber's winning dive into the pool, he's mid air!

As Red Bull Racing prepares for their upcoming race events, a recent highlight was at the Monaco Grand Prix where the team won the race for the third consecutive time! It was a nail biting race and that also resulted in a second consecutive win for Mark Webber in Monaco.

Platform Compuitng  logo on the car.  Very cool.
The partnership between Red Bull Racing and Platform Computing, an IBM company is an exciting one, seeing the team rise to the top and win the most prestigious race on the calendar… again was truly amazing! It was nothing but smiles on the Energy Station on race day. A big CONGRATULATIONS to the team and the best of luck as the season pushes ahead!

For more information on the partnership between Platform Computing and Red Bull Racing see
www.ibm.com/platformcomputing or check out the video or read the case study.

Next up the Grand Prix of Europe.

Infrastructure Cloud Strategies

San Francisco, Monday Feb 20th, evening. I'm wondering what path most Platform customers will take to the cloud over the next decade, and perhaps more interesting, what will the intermediate steps be as they prepare for that journey. I'm here for the molecular tri-con, and had the pleasure, or more properly, the learning experience of listening to people like Deepak Singh of Amazon, Chris Dagdigian of Bioteam, and Jason Stowe of Cycle Computing.

For a while, global organizations have recognized that its much easier to move computing resources and demands to the location where data is housed rather than the reverse. As such, its now quite common to hear that regions within globally distributed companies have specialties, and take on projects of their own. Naturally, this isn’t as efficient as it could be, and perhaps the promise of 3D compression services for remote visualization will eventually unlock the user from only working on data close to him. But that’s another story and another blog post.

More and more, in these cloud computing conferences, the need for data is emphasized, or seen as a barrier for widespread use of cloud computing. Simply put, the time to transfer data up to the cloud process it, and bring it back often negates the value of using a metered pay per use infinite compute service. For me this is like not being able to drive a new Nissan GTR out of the garage simply because it doesn’t have any gas in the tank.

Amazon EC2 has recognized the import of data and made uploads into the cloud a free process. Of course, storing that data in the cloud isn't free, but then again, neither is buying redundant filers, and locating them in geographically dispersed data centers. But more cleverly, once data is sitting in the cloud, not only is access to it nearly guaranteed by amazon, but that data is easily and cost effectively manipulated in any number of ways using the EC2 instances as operators. There is no secret sauce here, no super-sophisticated technology which Amazon has developed with scores of software developers with Mensa membership cards in their wallets.

Cloud vendors take note (IBM people included) Once an organization starts locating corporate data in a particular cloud as a business policy, that cloud provider has won the battle for where that data will be processed. And processing demand only grows with time.

And the winner is... Jaguar Land Rover for business IT project of the year!

Held at a glittering event in London, Platform customer Jaguar Land Rover (JLR) has been awarded the prestigious business project of the year award at the annual UK IT Industry Awards hosted by Computing Magazine in the UK.

Platform Computing worked with Jaguar Land Rover to create an advanced IT environment to underpin the organisation’s virtual car product development, complying with strict safety and environment regulations. JLR has deployed a state-of-the-art system consisting of scalable compute clusters, engineering workstations all built from commodity technologies. The project was recognised for its complexity, ability to reduce the time to market, engineering costs and environmental impact of product development for JLR.

Big congratulations to the team for this significant achievement!

Big Data report from SC’11

In my previous blog I expressed high expectations for the Big Data-related activities at this year’s Supercomputing conference. Coming back from the show, I’d say the enthusiasm and knowledge on Big Data within the HPC community truly surprised me. Here are the major highlights from the show:

  • Good flow of traffic at the Platform booth for Platform MapReduce. Many visitors stopped by our booth to learn more about Platform MapReduce – a distributed runtime engine for MapReduce applications. I found it easy to talk to the HPC crowd because many folks in this  community are already familiar with Platform LSF and Platform Symphony; both are flagship products from Platform that have been deployed and tested in large-scale distributed computing environment for many years. Since Platform MapReduce is built on similar core technology as what’s in those mature products, the HPC community quickly understood the key features and functions it brings to Big Data environments. Even though many users are still at  early stage of either developing MapReduce applications or looking into new programming models, they understand that a sophisticated workload scheduling engine and resource management tool will become critically important once they are ready to deploy their applications into production. Many HPC sites were also interested in exploring the potential of leveraging their existing infrastructure for processing data-intensive applications. For instance, questions on how a MPI and MapRedcue jobs can coexist on the same cluster were frequently asked at the show. The good news, Platform MapReduce is the only solution that can provide capability of supporting mixed workloads.
  •  “Hadoop for Adults” -- This was a quote from one of the attendees after sitting through our breakfast briefing on overcoming MapReduce barriers. We LOVE it! The briefing lured over 130 people and well exceeded our expectations! Our presentation on how to overcome the major challenges in current Hadoop MapReduce implementations drew great interest. “Hadoop for Adults” sums up the distinct benefits Platform MapReduce brings. Platform Computing knows how to manage large-scale distributed computing environments. Bringing that same technology into Big Data environments is a natural extension for us. The reaction at SC’11 for Platform MapReduce was encouraging and a validation on our expertise in scheduling and managing workloads and overall infrastructure in a center.
  • Growing momentum on application development. As sophisticated as always, the HPC community is at the forefront of developing applications to solve data-intensive problems across various industries and disciplines: cyber security, bioinformatics, electronic industry and financial services are just a few examples. Many Big Data related projects are being funded at HPC data centers and we are expecting a proliferation of applications coming out of those projects very soon.

The show is officially over but the excitement around Big Data will continue. For me, not only have I gained tremendous insights on the Big Data momentum in HPC, but I’m also pleased to see the overwhelming reaction for Platform MapReduce within the HPC community. Nothing beats pitching the right product to the right audience!