In the next few posts we will tell you about using multi-core processors in practice. For whatever they say about the multi-core, you need to "teach" programs to efficiently use multiple cores anyway. And in this first post you will see the announcement of the next issues and the first introductory note.
We should point out right away that there are pretty many various parallel programming technologies. And they differ not only and not quite in the programming languages but in the architecture approaches to building parallel systems.
For example, some technologies imply building parallel solutions resting on several computers (belonging to one type or different types), others imply work on one machine with several processor cores.
The systems based on using several computers are referred to the class of distributed computing systems. Such solutions have been used for a long time, they are quite clear to the industry's experts and there is much literature on this type of systems. The most telling example of the distributed computing technologies is MPI[http://www.viva64.com/terminology/MPI.html] (Message Passing Interface). MPI is the most popular standard of the data exchange interface in the parallel programming. It was implemented for many computer platforms. MPI provides the programmer with the single mechanism of branch interaction inside a parallel application regardless of the computer architecture (single-processor/multi-processor with shared/separate memory), relative location of the branches (on one processor or on different ones).
As MPI is intended for the systems with separate memory, it is not a very good idea to use it to arrange a parallel process in the system with shared memory. This will be too redundant and complicated, that is why the solutions like OpenMP began to develop. Yet nothing prevents you from making MPI-solutions for one computer anyway.
But the parallel programming systems to work on one machine have begun to develop relatively recently. Of course, you should not think that these are brand-new solutions but it is because of the appearance (more exact, the approaching appearance) of multi-core systems on desktops that programmers should consider such technologies as OpenMP, Intel Thread Building Blocks, Microsoft Parallel Extensions and some others.
It is very important that a parallel programming technology enable you to parallelize a program gradually. Of course, an ideal parallel program must be parallel from the beginning and rather be written in some functional language where there is no question of parallelization at all... But programmers live and work in the real world where they have 10 Mbytes of code in C++ at best, or even in C, instead of the trendy multifunctional F#. And they must parallelize this code gradually. In this case, OpenMP technology (for instance) will be a very lucky choice. It allows you to find the fragments in the application that need to be parallelized most and make them parallel in the first place. In practice it looks like this. The programmer searches for bottlenecks in the program which are the slowest with the help of some profiling tool. Why should you use any tool at all? Because you will not be able to find bottlenecks in an unfamiliar project of 10 Mbytes if you are not a telepath. Then these bottlenecks are made parallel with OpenMP. After that you may find other bottlenecks and so on until you get the needed performance. The process of developing the parallel version may be interrupted while you release intermediate products, and then you may return to it as far as you need. That is, in particular, why OpenMP technology became rather popular.
OpenMP (Open Multi-Processing) is a set of compiler directives, library procedures and environment variables intended for programming multithreaded applications on multi-processor systems with shared memory (SMP-systems).
The first OpenMP standard was developed in 1997 as an API intended for writing easily-portable multithreaded applications. At first it was based on Fortran language but then included C and C++.
OpenMP interface became one of the most popular parallel programming technologies. OpenMP is successfully used both while programming super-computer systems with many processors and in desktop user systems or, for example, in Xbox 360.
Development of OpenMP specification is performed by several large hardware and software vendors whose work is controlled by the non-commercial organization "OpenMP Architecture Review Board" (ARB).
OpenMP uses the parallel execution model "fork-join". An OpenMP program begins as a single execution thread called the master-thread. When the thread meets a parallel construct, it forks into a new thread-team that includes the master-thread itself and some additional threads, and becomes the master-thread in this team. All the members of the team (including the master-thread) execute the code inside the parallel construct. At the end of the parallel construct there is an implicit barrier. After the parallel construct the user code is executed only by the master-thread. A parallel region may include other parallel regions where each thread of the first region becomes the master-thread in its thread-team. Nested regions, in their turn, may include regions of deeper nesting levels.
The number of threads in the team executed in parallel may be controlled in several ways. One of them is to use the environment variable OMP_NUM_THREADS. Another method is to call the procedure omp_set_num_threads(). One more way is to use the expression num_threads together with the directive parallel.
By this note we begin a small cycle of publications devoted to studying OpenMP technology and the toolkit for parallel software development. In the next posts you will learn:
Wait for the next issue of lessons...