HPC from A-Z (part 25) - Y

Y is for… yield strength

Think how many children play with plasticine and Play-Doh and day dream about being an artist or a sculptor when they grow up. The majority of adults actually follow different career paths but the true passions of a child often resonate with their adult selves in a different and more advanced form. Just look at the number of engineers in the world!

‘Yield strength’ or ‘yield point’ is in many ways, basic knowledge for ‘grown up’ sculptors. It is the stress at which a material begins to deform and can no longer return to its original shape. This is vital information for design as it represents an upper limit of a load that can be applied on a surface and similarly is an important consideration in materials production. How would we create new objects and materials without this knowledge?

Times have moved on since my Play-Doh days and the engineers of today no longer need to determine the yield strength of a material by stacking weights on top of materials one by one. Instead, this type of testing is nearly all done through simulations so that the extensive (and expensive) real world testing is only conducted on a select few prototypes. Think about the materials used to create space shuttles. Not only do they cost a small fortune but making a mistake and using a material with the wrong yield strength could actually impact human life. Simulation is crucial to avoid these types of issues.

The role of HPC? It’s all about the high performing cluster technology that is required for the engineers at the heart of material development to vet prototypes without ever having to develop scale models. It’s a profit enabler, enabling faster product development and time to market for materials and the designers that make use of them.

Blog Series – Five Challenges for Hadoop MapReduce in the Enterprise

With the emergence of “big data” has come a number of new programming methodologies for collecting, processing and analyzing the large volume and often unstructured data. Although Hadoop MapReduce is one of the promising approaches for processing and organizing results from unstructured data, the engine running underneath MapReduce applications  is not yet enterprise ready. At Platform Computing, we have identified five major challenges in the current Hadoop MapReduce implementation:

·         Lack of performance and scalability
·         Lack of flexible and reliable resource management
·         Lack of application deployment support
·         Lack of quality of service
·         Lack of multiple data source support

I will be taking an in-depth look at each of the above challenges in this blog series. To finish, I will share our  vision on what an enterprise–class solution should be  that will not only address the five challenges customers are currently facing,  but also expand beyond those boundaries to explore the capabilities  of the next generation Hadoop MapReduce runtime engine.

 Challenge #1:  Lack of performance and scalability

Currently the open source Hadoop MapReduce programming model does not provide the performance and scalability needed for production environment, this is mainly due to its fundamental architectural design.   On the performance measure, to be most useful in a robust enterprise environment a MapReduce job should take  sub-millisecond to start,  but the job startup time in the current open source MapReduce implementation is measured in seconds. This high latency at the beginning can lead to subsequent delays in getting to the final results and cause significant financial loss to an organization. For instance, in capital markets of the financial service sector, a millisecond of delay can cost a firm millions of dollars.  On the scalability front, customers are looking for a runtime solution that is not only capable of  scaling one MapReduce application as the problem size grows,  but one that can also support multiple applications of different  kinds running across thousands of cores and servers at the same time.  The current Hadoop MapReduce implementation does not provide such capabilities. As a result, for each MapReduce job, a customer has to assign a dedicated cluster to run that particular application, one at a time.  This lack of scalability will not only introduce additional complexity into an already complex IT data center and make it hard to manage, but it also creates a siloed IT environment in which resources are poorly utilized.

A lack of guaranteed performance and scalability is just one of the roadblocks preventing enterprise customers from running MapReduce applications at production scale.  In the next blog,   we will discuss the shortcomings in resource management in the current Hadoop MapReduce offering and examine the impact it brings to organizations tackling “Big Data” problems.

HPC from A-Z (part 24) - X

X is for X-ray analysis

There’s no denying that science is significantly more advanced than ever before. In fact, I would argue that today’s doctors and scientists would be lost if asked to work in the conditions experienced by their counterparts in times gone by.

Modern technology has allowed medicine and associated scientific disciplines to go beyond the simple diagnosis of coughs and colds or a broken leg. Instead, researchers can now extract huge amounts of detail about atom arrangement and electron density using techniques such as x-ray crystallography modelling and gather physical proof for their theories.

For example, a deeper understanding of biological molecules means bioscientific researchers can examine molecular dynamics, look further into protein analysis and identification, sequence alignment and annotation, and structure determination analysis. The results are fascinating and they are really driving the growth of knowledge in the scientific profession. What many don’t realize, however, is the sheer volume of data behind discoveries and how much information advanced research techniques actually generates.

Another example is medical imaging studies which use data from digitized X-ray images. In this case HPC is used to glean information from numerous medical databases. In fact, supercomputing facilities, along with industry, are exploring ways today that will ensure access to massive amounts of stored digital data for future research.

With HPC able to provide such insight into scientific happenings here on earth, just think what insights could come from Chandra about the wider Universe (and ET!) in years to come…