Parallel notes N3 - base OpenMP constructs

Mar 02 2010

Author: Andrey Karpov

Functions
Directives
parallel directive
for directive
Directives private and shared

Now we would like to start introducing you into OpenMP technology and show you the ways of using it. In this post we will discuss some base constructs.

When using OpenMP we add two types of constructs into the program: OpenMP execution environment functions and special "#pragma" directives.

Functions

The role of OpenMP functions is rather an auxiliary one because parallelization is implemented through directives. But in some cases they are very useful and even necessary. The functions may be distinguished into three categories: execution environment functions, lock/synchronization functions and timer functions. All these functions have names beginning with "omp_" and are defined in the header file omp.h. We will discuss the functions in the next posts.

Directives

A C/C++ #pragma construct is used to specify additional options to the compiler. With the help of these constructs you may specify how data in structures must be aligned, prohibit generating particular warnings and so on. A #pragma construct is written in this format:

#pragma directive

A special key directive "omp" indicates that the commands are related to OpenMP. Thus, #pragma directives intended to work with OpenMP have the following format:

#pragma omp <directive> [clause [ [,] clause]...]

As any other pragma directives, they are ignored by those compilers that do not support this technology. In this case, the program is compiled as a serial one and without any errors. This feature allows you to create a highly portable code based on OpenMP technology. The code containing OpenMP directives may be compiled by a C/C++ compiler not familiar with this technology. The code will be executed as a serial one but this way is better than splitting the code into two branches or adding a lot of #ifdef.

OpenMP supports the directives private, parallel, for, section, sections, single, master, critical, flush, ordered and atomic and some others which define work distribution mechanisms and synchronization constructs.

parallel directive

You may call "parallel" directive the most important one. It creates a parallel region for the structured block that follows it, for example:

#pragma omp parallel [other directives]
  structured block

"parallel" directive specifies that the structured code block must be performed concurrently in several threads. Each of the created threads performs the same code in the block but not the same command set. In different threads different branches may be executed or different data processed - this depends upon the operator "if-else" or work distribution directives.

To demonstrate execution of code in several threads, let us print some text in the block being parallelized:

#pragma omp parallel
{
  cout << "OpenMP Test" << endl;
}

On a 4-core computer, we may expect the following result to be printed:

OpenMP Test
OpenMP Test
OpenMP Test
OpenMP Test

But in practice I got this one:

OpenMP TestOpenMP Test
OpenMP Test
OpenMP Test

This is explained by shared use of one resource by several threads. In this case we print the text on one console in four threads which do not arrange with each other about the printing order. This is a race condition.

Race condition is an error related to design or implementation of a multitask system when system operation depends upon the order of executing code fragments. This kind of errors is the most common in parallel programming and it is a very tricky one. It is difficult to recall and localize this error because it is not permanent and occurs from time to time (see also the term heisenbug).

for directive

The example examined above demonstrates how parallelization is implemented but is senseless by itself. Now let us get a real benefit from parallelization. Suppose we need to extract a root from each item of an array and write the result into another array:

void VSqrt(double *src, double *dst, ptrdiff_t n)
{
  for (ptrdiff_t i = 0; i < n; i++)
    dst[i] = sqrt(src[i]);
}

If we write so:

#pragma omp parallel
{
  for (ptrdiff_t i = 0; i < n; i++)
    dst[i] = sqrt(src[i]);
}

we will just do too much unnecessary work instead of speeding up the code. We will extract roots from all the array items in each thread. To parallelize the loop we need to use the work distribution directive "for". The directive "#pragma omp for" specifies that the loop iterations must be distributed among the team threads in the parallel region while for loop is being executed:

#pragma omp parallel
{
  #pragma omp for
  for (ptrdiff_t i = 0; i < n; i++)
    dst[i] = sqrt(src[i]);
}

Now each created thread will process only a particular part of the array assigned to it. For example, suppose there are 8000 array items, so if we have a four-core computer, the work may be distributed in this way. The variable "i" takes values from 0 to 1999 in the first thread. In the second - from 2000 to 3999. In the third - from 4000 to 5999. In the fourth - from 6000 to 7999. In theory, the work speeds up 4 times. In practice, it is a bit less though because we need to create threads and wait for them to terminate. At the end of the parallel region barrier synchronization is implemented. In other words, when reaching the end of the region all the threads are locked until the last thread terminates.

You may shorten the text by uniting several directives into one control string. The code above will be equal to:

#pragma omp parallel for
for (ptrdiff_t i = 0; i < n; i++)
  dst[i] = sqrt(src[i]);

Directives private and shared

Data may be shared or private in relation to regions. Private data belong only to one thread and can be initialized only by this thread. Shared data are available to all the threads. In the example above, the array was shared. If a variable is defined outside a parallel region, it is considered shared by default but if it is inside the region, it is private. Suppose we should use an intermediate variable "value" to calculate the square root:

double value;
#pragma omp parallel for
for (ptrdiff_t i = 0; i < n; i++)
{
  value = sqrt(src[i]);
  dst[i] = value;
}

In this code, the variable "value" is defined outside the parallel region defined by directives "#pragma omp parallel for" and therefore is shared. As a result, "value" variable will be used by all the threads simultaneously - it will cause a race condition and we will get garbage in the end.

To make the variable private for each thread we may use two methods. The first is to define the variable inside the parallel region:

#pragma omp parallel for
for (ptrdiff_t i = 0; i < n; i++)
{
  double value;
  value = sqrt(src[i]);
  dst[i] = value;
}

The second is to employ the directive "private". Now each thread will work with its own value variable:

double value;
#pragma omp parallel for private(value)
for (ptrdiff_t i = 0; i < n; i++)
{
  value = sqrt(src[i]);
  dst[i] = value;
}

Besides "private" directive there exists "shared" directive. But usually it is not used because all the variables defined outside the parallel region are shared by default and there is no need in this directive. Still you may use it to make the code more comprehensible.

We have discussed only few OpenMP directives and will continue to study them in the following lessons.