Andrey Karpov , Evgenii Ryzhkov

Feb 16 2008

Adaptation of the technology of the static code analyzer for developing parallel programs

Feb 16 2008

Author: Andrey Karpov , Evgenii Ryzhkov

Introduction
1. The perspectives of systems of development and testing parallel programs
2. Dynamic and static analysis as the most perspective methods for testing parallel code
3. The sphere of effective application of static parallel code analyzer
4. Static code analysis on the example of the technology OpenMP
Conclusion
References

In the article the question of use of the static code analyzers in modern parallel program development processes is considered. Having appeared in 70-80s as an addition to compilers, the static analyzers stopped to be popular with the developers in 90s. The reason was probably the increase of the errors diagnostics quality by the compilers. But in 2000s the interest to the static code analyzers started to increase again. It is explained by the fact that new static code analyzers were created, which started to detect quite difficult errors in programs. If the static code analyzers of the past made it possible, for example, to detect an uninitialized variable, modern static code analyzers tend to detect an unsafe access to data from several threads. The modern tendency of static code analyzers development became their use for diagnosing errors in parallel programs. In the work the situations are considered, where the use of such tools makes it possible to considerably simplify the process of creating parallel program solutions.

Introduction

One of the actively developing tendencies in the sphere of computer engineering is parallel computing. Parallelism becomes the basis of the growth of efficiency, which is connected with the slowdown of rate of growth of clock frequency of modern microprocessors. If during the latest 30 years the efficiency was determined by the clock frequency, optimization of commands execution, cache increase, then in the nearest years it will be determined by multi-core.

The processors producers shifted the accent from increasing the clock frequency to the realization of the parallelism in the processors proper at the expense of using multi-core architecture. The essence of the idea is to integrate more than one core into one processor. This approach makes it possible to avoid many technological problems connected with increasing clock frequencies and to create at the same time more efficient processors. The system including the processor with two core in fact does not differ from two-processor computer and the system with a four-core processor does not differ from a four-processor computer.

It is very difficult for the programmers who have begun to use multi-core systems to direct themselves in all the details of their using during developing programs of classic applications. As the practice shows, the difficulties begin when the claims of the efficiency and mobility are laid to the parallel software under development. It is connected with the fact that universal tools that facilitate the work of the programmer and provide a full access to debugging information are on the stage of development. The problem consists in the absence of standards in the sphere of creating and debugging programs for parallel systems for the reason of the computing sphere being young.

The development of multi-core computing machinery is closely connected with the development of the technology of parallel programming, both universal and under a certain architecture of a specific system. One of the directions in the sphere of improvement of the methodology of parallel programming is the research work in the sphere of static program code analyzer with the aim of revelation of errors typical of parallel application.

1. The perspectives of systems of development and testing parallel programs

A joke is popular with the specialists who are occupied with parallel calculations: "Parallel computing is the technology of the future, and this will be ever urgent". This joke has kept its urgency for several decades already. Similar spirits were spread among the community of computers architecture developers, concerned by the fact that the limit of clock frequency of processors will be reached soon, though much slower than before. The fusion of specialists' optimism about parallel calculations and systems architects' pessimism favored the appearance of revolutionary multi-core processors [1].

Parallel computing ceased to be the privilege of large hardware-software complexes meant for big tasks and now come to be a means for solving everyday tasks linked with work, studies, entertainment, computer games, medicine.

But a huge layer of tasks connected with software development that takes into account the aspects of parallel programming is clearly observed at the introduction of new highly efficient parallel counting systems. The development of parallel software is a difficult task, which is now taken up mainly by research institutes of modeling, organizations connected with the data bulk and other big establishments. But this is not practically taken up by ordinary programmers, who take part in developing applied software for an ordinary user. Such programs are in majority and as a consequence the majority of programmers have never run across the tasks of parallel data processing.

The crisis in the sphere of software development is outlined connected with the forthcoming of multi-core systems at the mass market. And this crisis will be connected with the lack of specialists creating and supporting parallel code or capable of modifying older developments taking into consideration the technology of paralleling. Such specialists are quite few and are well placed and have a good well-paid job.

Naturally many programmers will later master new technologies and will get the skill of creating parallel code. But a lot of time will pass and not everyone will be likely to realize such transition successfully. The thing is that the technology of parallel programming is much more difficult than other technologies changes like, for example, migration from C++ to C# or from 32-bit systems to 64-bit. Parallel programming requires from a person the knowledge of theory of parallel algorithms, synchronization, means of construction of parallel code and many other things.

The necessity of tools facilitating the adaptation and testing the existing software for multi-core counting systems will naturally occur.

Many steps are taken and a number of decisions and methodologies are offered in the direction of automation and facilitating the creation of parallel program systems. The examples are: MPI, OpenMP, Intel Threading Analysis Tools, or, for example, a Russian development OpenTS [2] by Program Systems Institute of the Russian Academy of Sciences. But the existence of all these systems unfortunately does not give the opportunity to speak of complex solution of problems, which a beginning or an experienced programmer comes across.

If the questions of the technologies making it possible to create parallel programs can be considered closed, then the questions of testing and verification of parallel program code demand great attention and creating corresponding support systems.

2. Dynamic and static analysis as the most perspective methods for testing parallel code

The main ways of program code verification are: the program proof, white box method, black box method, code review, dynamic code analysis, static code analysis. Unfortunately, in case of parallel code the greatest practical result is presented by two last methods only. We will try to give proof the this statement.

The program proof. It does not raise any doubts that only this approach makes it possible to guarantee an algorithm correctness [3]. But the research of parallel programs must not be carried out in terms of relations between incoming and outcoming data of the program like it is usually for the successive/consecutive programs. This show that the verification and proof of the correctness of work of such programs demands the development of adequate means of formal specification. In particular, it is necessary to have an opportunity to formally express the relation between the states of the system at the moments of time when these or those take place during the work of the program system [4].

Adding extra complexity makes this approach even more tedious and grounded only at creating critical program systems. If to take into account the fact that the proof of code correctness is not practiced even for series code, then we can't consider this method as practical methodology of parallel programs verification.

White box method. Let us understand under the white box method testing the execution of highest available quality of different parts of the code with the use of debugging or other tools. The more coverage of the code was achieved, the more completely the testing was carried out. The white box method is sometimes also understood as a simple application debugging with the aim of finding a certain error. A complete testing with the help of a white box method has become impossible long time ago due to huge code volume of modern programs. In case of parallel programs this method may turn out to be absolutely useless and inapplicable in practice. The reason consists in the fact that, like in the physic of elementary particles, the effect is observed when a change of a certain parameter influences on the state of the whole system. Thus, it is possible to make a conclusion that this method is of no great practical interest.

The black box method established a much better reputation. Unit test can be also referred here. The main idea of the method consists in writing a set of tests for separate modules and functions, which will verify all the main modes of their work. A number of sources refer a unit test to a white box method for it is based on the knowledge of program organization. The authors tick to the opinion that tested functions and modules must be considered as black boxes because unit tests must not take into account the inner organization of the function. The ground for it can be the methodology of development when tests are developed before creating the function itself, which favors the increase in the control of their functionality from the point of view of the specification.

The black box method is quite suitable for parallel programs but anyway it can't be the method providing a sufficient level of reliability of the verification. At another system with another quantity of computational nodes or at a heterogeneous system a parallel program can behave in a completely different way. This unpleasant peculiarity sufficiently limits the attractiveness of the method in contrast to using it for the verification of usual programs.

Code review. This is the oldest, the most proved and reliable approach to searching for any kinds of defects. This methodology is based on the combination of code reading with a execution of a set of rules and recommendations very well described, for example, in Steve McConnell's book "Code Complete" [5]. Unfortunately this practice is inapplicable for a large-scale verification of modern program systems because of their big volume. Though this way gives beat results, it is not always used under the conditions of modern life cycles of the software development where a very important moment is the period of development and the time of the product release to the market. That's why the code review generally comes to rare meetings the aim of which is training new and not less experienced staff to write a high quality code than for verification of work of a number of modules. This is a very good way of increasing programmers' qualification but it can't be considered as a complete means for quality control of a developed system.

The methodology of static code analysis, which will be described further, comes to assist the programmers who are aware of the necessity of regular code review but do not have enough time. The main task of the methodology is reducing the code volume demanding the man's attention and thus reducing the time for its review.

The dynamic code analysis is a software analysis, which is carried out with the help of executing programs at a real or virtual processor. A dynamic analysis is often understood as a program code study with the aim of its optimization. But we will speak of the dynamic analysis as of the method of parallel programs testing.

For dynamic race detection, the program is executed and a history of its memory accesses and synchronization operations is recorded and analyzed. Because the history is in fact a feasible execution, dynamic detectors typically suffer a lower false positive rate than static detectors. On the other hand, since not all possible execution paths are considered, the dynamic technique is not sound and thus cannot certify a program to be free of data races. The dynamic technique also imposes runtime overhead which must be kept within limits while still providing reasonably accurate race detection. The post-mortem approach records events during program execution and analyzes them later, or records only critical events and then carefully replays the program. This approach is unsuitable for long-running applications that have extensive interactions with their environment. The other approach is to record and analyze information as efficiently as possible on-the-fly [6].

Dynamic parallel code analyzers are already represented by the commercial solutions and have taken a deserved place in a row of other code testing systems (a vivid example is a set of tools Intel Threading Tools [7]). In the scope of it we will not speak about this dynamic analysis methodology and hope that this methodology and corresponding tools will be helpful in creating reliable parallel systems.

But like it was mentioned before, dynamic analysis does not make it possible to find all the errors for it is often impossible to execute the whole tested program code or the succession of its execution will be very different from the real system. In this case the static analysis will be helpful, and we will describe it in more detail. It is connected with the fact that all researches and realizations in this direction are more likely to have a theoretical character and need more research and practical work.

Static code analysis is a software analysis which is carried out without real execution of programs under research. Depending on the tool which is used the depth of the analysis can vary from defining the behavior of certain operators to the analysis including all the source code. Lately, the static analysis was mostly used for verification of software properties, which is used in computer systems of high reliability.

In case of parallel programs the static analysis makes it possible to detect errors even in those sections of the code, which get called only in very rare cases. Regardless of the stage of execution, static analysis makes it possible to detect potential errors which may not show up during the work at the whole class of apparatus or program systems. The static analysis can thus substantially complement dynamic analysis and be effectively used in parallel software development life cycle.

3. The sphere of effective application of static parallel code analyzer

The obstacle on the way of wide introduction of static analysis method consists in the fact that difficulty of creating such systems is much higher than dynamic ones. That's why in the network of this article we will offer not an attempt to describe and create an ideal analyzer but to find a sphere of application for which the creating of the analysis is an observed task and can give a good practical effect.

For this reason let us consider main levels of decomposition of the task in case of creating parallel algorithms. Let us name these levels:

Division of tasks into subtasks.
Division of subtasks into many quasi-homogeneous procedures which are executed at the same time with different source data. In the mathematical physics such type of parallelism is called geometrical parallelism for paralleling here is carried out with the help of calculations distribution in different points of calculation sphere into different processors.
Paralleling different procedures.
Paralleling certain processor commands. Optimizing compilers and processors proper cope with it quite well already.

Division of tasks into subtasks. It is an ideal case of paralleling when the task is divided into several parts each of which can be solved separately. In this case the system of interaction of parallel threads is quite simple and there's no sense to talk about a static analysis.

Division of each subtask into many parallel procedures. On the one hand it is the most interesting level for analysis. This is exactly the level of parallel procedures where the most part of mistakes is made, which is connected with particular difficulty of such systems. This level is also interesting with the fact that it helps to achieve good temporary indicators of paralleling. That's why many researchers start to work with static analysis at this level and find it very difficult. They consider demonstration examples. But when they get to real systems, moreover written on such difficult language a C++, then it becomes a very difficult task to create a working static analyzer. As the result we observe practical absence of commercial solutions based only on static analysis. Dynamic-static analysis is practically used when after the stage of program execution static analysis is carried out of the received data and source code.

Thus, it is possible to make a conclusion that though static analysis is interesting for the second level, but it is difficult to be realized in practice. But one more step can be made on getting lower to the third level, which is often forgot, and to find out that this is the very level where static analysis can give the best result.

This is the level of different systems organizing the execution of both separate functions and sectors of function code. MPI, OpenMP, OpenTS can be referred here. Exactly for such systems the use of static code analyzers is considered the most reasonable.

We will consider the possibility of application of static code analyzer on the example of OpenMP as the most actively developing technology. The standard OpenMP was developed in the year 1997 as API, aimed at writing sorted multithread applications. First it was based on Fortran language but later included C/C++. The last version of OpenMP is 2.0; it is fully supported by Visual C++ 2005. The standard OpenMP is supported by the platform Xbox 360.

4. Static code analysis on the example of the technology OpenMP

Let us consider two simple examples demonstrating the cases where static analyzers may be helpful, for example, in the process of writing a new code. Let us imagine that a programmer written the following code:

First example:

size_t i;
#pragma omp parallel sections num_threads(2)
{
  #pragma omp section
  {
    for (i = 0; i != arraySize; ++i)
      array[i].a = 1;
  }
  #pragma omp section
  {
    for (i = 0; i != arraySize; ++i)
      array[i].b = 2;
  }
}

In this example the error consists in using one variable 'i' for the two simultaneously executed cycles which leads to an emergency program completion. The troubleshooting of this error can be easily realized in a static analyzer with the help of verification that the same variable is used for recording in two parallel sections. Of course such mistake is likely to be detected at the testing stage, but the integration of a static analyzer in the process of development can substantially save time at the expense of preliminary analysis just before the beginning of the testing.

The next example is also very simple, but it can arouse the necessity to spend much time on searching an error. This code leads to calculation of the wrong magnitude of the sum aroused by the absence of OpenMP directive reduction(+: sum).

Second example:
int sum = 0;
#pragma omp parallel num_threads(2)
{
  #pragma omp for
    for (int i = 0; i < arraySize; ++i)
      sum += array[i];
}
_tprintf(_T("sum=%i\n"), sum);

Such constructions of recording in variable 'sum' from two threads is also easily diagnosed by static analyzers and make it possible to substantially facilitate the work on program code verification.

Conclusion

As it was said before, no commercial static analyzers for parallel systems support are represented at the market At the same time, there's no doubt in the necessity of creating such tool, for the existing means of development of parallel programs do not make it possible to solve all the problems. Let us describe one more time what such solution may look like. The developer prepares parallel code and then starts the static analyzer, which informs whether the code is correct or not. In case if the developed code contains errors, then the diagnostic message will make it possible to quickly find and correct it.

We hope that on appearance of the tool of static parallel code analyzer the amount of program errors in the developed systems will reduce.

References

Kang Su Gatlin, Pete Icency. OpenMP and C++ // MSDN Magazine #10, 2005.
Sergey Abramov, Alexei Adamovich, Alexander Inyukhin, Alexander Moskovsky, Vladimir Roganov, Elena Shevchuk, Yuri Shevchuk, and Alexander Vodomerov. 2005. OpenTS: An Outline of Dynamic Parallelization Approach. Parallel Computing Technologies: 8th International Conference, PaCT 2005, Krasnoyarsk, Russia, September 5-9, 2005. Proceedings. Editors: Victor Malyshkin - Berlin etc. Springer, 2005. - Lecture Notes in Computer Science: Volume 3606, pp. 303-312
Daykstra E, The programming discipline. M.: Mir, 1978.
Valiev M.K. Application of temporary logics to programs specification. // Programming. 1998, 2, p. 3-9.
Steve McConnell, "Code Complete, 2nd Edition" Microsoft Press, Paperback, 2nd edition, Published June 2004, 914 pages, ISBN: 0-7356-1967-0.
Yuan Yu, Tom Rodeheffer, Wei Chen. RaceTrack: Efficient Detection of Data Race Conditions via Adaptive Tracking // SOSP05, October 23-26, 2005, Brighton, United Kingdom.
Clay R. Breshirs, The set of instruments Intel Threading Tools and OpenMP library.