![]() PVS-Studio Static Code Analyzer for 64-bit and parallel C/C++ code
|
||||||||||||||||||||
![]() ![]() ![]() ![]() ![]()
11.03.2010
Parallel notes N4 - continuing to study OpenMP constructs In this post we will continue to introduce you into OpenMP technology and tell you about some functions and new directives.»
02.03.2010
Parallel notes N3 - base OpenMP constructs Now we would like to start introducing you into OpenMP technology and show you the ways of using it.»
28.02.2010
In what way can C++0x standard help you eliminate 64-bit errors Programmers see in C++0x standard an opportunity to use lambda-functions and other entities I do not quite understand :).» ![]()
10.12.2009
PVS-Studio FAQ This paper contains some questions and answers about PVS-Studio code analyzer by OOO "Program Verification Systems".»
09.12.2009
VivaCore FAQ This paper contains some questions and answers about VivaCore C/C++ code analysis library by OOO "Program Verification Systems"»
23.11.2009
PVS-Studio: using the function "Mark as False Alarm"
The article describes and demonstrates by an example the use of PVS-Studio 3.40 new function "Mark as False Alarm". » ![]() |
Code Analysis![]() Using Static Analysis in Program DevelopmentAlexey Kolosov
| |||||||||||||||||||
int Func(){return 0;}
|
Lexer will process this line in the following way (see table 1):
| int | Func | ( | ) | { | return | 0 | ; | } |
| reserved word | identifier | reserved symbol | reserved symbol | reserved symbol | reserved word | integer constant | reserved symbol | reserved symbol |
The line will be identified as 8 correct tokens and these tokens will be passed to parser. The parser will check the context and find out that it is a declaration of function, which takes no parameters, returns an integer, and always returns 0.
The parser will find it out by creating an abstract syntax tree from the tokens provided by the lexer and analyzing the tree. If the tokens and the tree built from them will be considered to be correct, the tree will be used for the static analysis. Otherwise, the parser will report an error.
However, building an abstract syntax tree is not just organizing a set of tokens in a tree form.
An abstract syntax tree captures the essential structure of the input in a tree form, while omitting unnecessary syntactic details. ASTs can be distinguished from concrete syntax trees by their omission of tree nodes to represent punctuation marks such as semi-colons to terminate statements or commas to separate function arguments. ASTs also omit tree nodes that represent unary productions in the grammar. Such information is directly represented in ASTs by the structure of the tree.
ASTs can be created with hand-written parsers or by code produced by parser generators. ASTs are generally created bottom-up.
When designing the nodes of the tree, a common design choice is determining the granularity of the representation of the AST. That is, whether all constructs of the source language are represented as a different type of AST nodes or whether some constructs of the source language are represented with a common type of AST node and differentiated using a value. One example of choosing the granularity of representation is determining how to represent binary arithmetic operations. One choice is to have a single binary operation tree node, which has as one of its attributes the operation, e.g. "+". The other choice is to have a tree node for every binary operation. In an object-oriented language, this would results in classes like: AddBinary, SubtractBinary, MultiplyBinary, etc. with an abstract base class of Binary[3].
For example, let us parse two expressions: 1 + 2 * 3 + 4 * 5 and 1+ 2 * (3 + 4) * 5 (see figure 1):

As one can see from the figure, the expression can be restored to its original form if you walk the tree from left to right.
After the abstract syntax tree is created and verified, the static analyzer will be able to check, whether the source code fulfils the rules and guidelines specified by the code standard.
There are many different analysis techniques, such as AST walker analysis, dataflow analysis, path-sensitive data flow analysis, etc. Concrete implementations of these techniques vary from tool to tool. Static analysis tools for different programming languages can be based on various analysis frameworks. These frameworks contain core sets of common techniques, which can be used in static analysis tools so that these tools reuse the same infrastructure. The supported analysis techniques and the way these techniques are implemented varies from framework to framework. For example, a framework may provide easy way to create an AST walker analyzer, but has no support for data-flow analysis[4].
Although all the three above mentioned analysis techniques use the AST created by parser, the techniques differ by their algorithms and purposes.
AST walker analysis, as one can see from the term, is performed by walking the AST and checking whether it fulfils the coding standard, specified as a set of rules and guidelines. This is the analysis performed by compilers.
Data flow analysis can be described as a process to collect information about the use, definition, and dependencies of data in programs. The data flow analysis algorithm operates on a control flow graph (CFG), generated from the source code AST. The CFG represents all possible execution paths of a given computer program: the nodes represent pieces of code and the edges represent possible control transfers between these code pieces. Since the analysis is performed without executing the tested program, it is impossible to determine the exact output of the program, i.e. to find out which execution path in the control flow graph is actually taken. That is why data flow analysis algorithms make approximations of this behavior, for example, by considering both branches of an if-then-else statement and by performing a fixed-point computation for the body of a while statement. Such a fixed-point always exists because the data flow equations compute sets of variables and there are only a finite number of variables available since we only consider programs with a finite number of statements. Therefore, there is a finite upper limit to the number of elements of the computed sets which means that a fixed-point always exists. In terms of control flow graphs, static analysis means that all possible execution paths are considered to be actual execution paths. The result of this assumption is that one can only obtain approximate solutions for certain data flow analysis problems[5].
The data flow analysis algorithm described above is path-insensitive, because it contributes all execution paths - whether feasible or infeasible, heavily or rarely executed - to a solution. However, programs execute only a small fraction of their potential paths and, moreover, execution time and cost is usually concentrated in a far smaller subset of hot paths. Therefore, it is natural to reduce the analyzed CFG and, therefore, to reduce the amount of calculations so that only a subset of the CFG paths are analyzed. Path-sensitive analysis operates on a reduced CFG, which does not include infeasible paths and does not contain "dangerous" code. The paths selection criteria are different in different tools. For example, a tool may analyze only the paths containing dynamic arrays declaration, which is considered to be "dangerous" according to the tool's settings.
The number of static analysis tools and techniques grows from year to year and this proves the growing interest in static analyzers. The cause of the interest is that the software under development becomes more and more complex and, therefore, it becomes impossible for developers to check the source code manually.
This article gave a brief description of the static analysis process and analysis techniques.