The attack of the parallel app!

Posted: January 19, 2011 in Uncategorized


In the past decades, traditional silicon-based “sequential” computers have reached the limits of the physics. Since the early 80s until almost a decade ago, the main trend in performance improvement was to increase the clock speed and shrink the size of the elemental transistor. New processors families came often with higher and higher core frequencies and smaller process sizes.
As you may know, the CMOS transistor technology (the one used in most of today’s electronics) is well known for consuming power when switching from one logic state to the opposite. This power is spread in the form of heat. So at 3,8 GHz it became too difficult to keep those micro-hells cool inexpensively.

Besides, at higher frequencies the period decreases, what means that the length of the trace that an electric pulse can travel shortens. This leads to problems where two functional units are too far from each other (inside the chip) and some sort of latch between them has to be placed in order to keep them working at that frequency.

Finally, Intel cancelled their plans to produce a 4 Ghz Pentium 4 in favor of dual-cores.

Today, processors and other computing devices are improved by adding parallel processing units, but this concept is not new at all. In the 70s the first multiprocessor machines began to appear at selected universities and in the 80s the equipment and tools for parallel computing became much more mature and supercomputing companies such as CRAY appeared. During the 90s and the 00s their computing power and electric power consumption grew exponentially.

However, CPUs are not the only parallel processors you can find. Graphics cards (even the oldest fixed-function GPUs) have always had an inherently parallel architecture due to the fact that many graphics computations and other stuff needed for rendering a frame can be carried out in parallel (for example, the 3DFX Vodoo2 had 2 parallel texturing units back in 1998).

Now graphic cards can be programmed in several ways. The first programmable cards allowed the user to replace certain stages of the graphics pipeline such as the rasterization with their own programs (i.e shaders) running on the GPU in parallel with the CPU.

People wanting to use the GPU for general non-graphics purposes those days used to encode the data into textures, run a shader in the GPU and get back the results embedded into another texture. The development of a hardware and software architecture for allowing the execution of general purpose programs on the GPU was just a matter of time and today we can do so with technologies like CUDA and OpenCL. This is called General Purpose computation on Graphics Processing Units, or GPGPU.

Now, most of the largest supercomputers rated at the top 500 (which is now headed by the chinese GPGPU-based Tianhe-1A with 2.57 petaflops/s) are leveraging this technologies that provide tons of computing power with lower electric power consumption than traditional systems.

Saving lives massively

There’s a wide range of problems that unless aided by a supercomputer would not be viable to solve. But… where do I begin searching?

You might be asking yourself: In which kind of problems are the biggest supercomputers being used?

Supercomputer usage (November '10)Take a look at the graph (Top500 supercomputers usage for November ’10), you’ll see a big fat “Not Specified” piece of the pie. I’m not sure what it means but I guess it must be military or government undisclosed projects, or perhaps secret industrial commercially-aimed applications.

The next biggest field of application is research. This makes sense as pharmaceutics and universities make extensive use of them. As you can see, supercomputer applications range in nature from Aerospace to Finance.

Let’s take a look at the paper “

Simulation and modeling of synuclein-based “protofibril”structures: as a means of understanding the molecular basis of Parkinson’s disease“.


In a nutshell, the researchers at the University of California–San Diego conducted a study to determine how the alpha-synuclein (which are unknown-function proteins) aggregate and are bound to membranes thus eventually preventing normal cell functions that are associated with Parkinson’s disease.

These membrane-binding surfaces can be treated with pharmaceutical intervention for dissolution.


For their purpose, they used molecular modeling and simulation techniques to guess the folding pathway of the protein into structures that bind to membranes.



They first developed a program called MAPAS to assess how strongly the protein interacts with the membrane.



Then they ran a set of Molecular Dynamics on the IBM BlueGene. This basically gave the membrane regions with more binding probability.



With this data they performed more Molecular simulation on that regions to simulate the binding process.



As they say in the paper, the results of the MAPAS program matched already known results in a test problem.



In mi opinion, the BlueGene isn’t actually better suited for this applications than others out there but I guess that it was more accessible for the researchers since it was conceived for biomolecular research and because of geographical proximity (California).


This project took 1.2 million processor hours.

What’s next?

Turns out that the parallel paradigm is already and is going to be the future trend. So learning this technologies is worth it, but this of course doesn’t come for free. Parallel applications are harder to write and debug than sequential ones. The programmer must face new difficulties like race conditions which lead to synchronization problems which, in turn, might end up with dead-locks or incorrect results if not watched closely.

Another important aspect of parallelism for performance is how a global problem is split down into smaller ones which can be solved individually by different processing elements and then put back together to compute the final result. This is, of course, absolutely problem (and machine)-dependent and might not be an easy, yet important, task. Once again “a good programmer must know the machine he’s programming for“.

The medium-term goal is reaching 1 Exaflop processing power and it looks feasible that computer technology changes once more in the meanwhile. The Moore’s law, which held true for decades is no longer applicable for transistors. However it can be extrapolated to processors.

As a matter of fact technology is continuously and quickly evolving. Who knows where it will lead us to in the future!


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s