25.03.2015

Static code analysis

Static code analysis is the process of detecting errors and defects in a software's source code. Static analysis can be viewed as an automated code review process. Let's speak about the code review now.

Code reviewing, is one of the oldest and safest methods of defect detection. It deals with joint attentive reading of the source code, and giving recommendations on how to improve it. This process reveals errors, or code fragments that can become errors in future. It is also considered that the code's author should not give explanations on how a certain program parts work. The program's execution algorithm should be clear from the program text and comments. If this is not the case, the code needs improving.

The code review usually works well, because programmers can notice errors in somebody else's code much easier than in their own. To learn more about the code review method, please see a wonderful book, "Code Complete" by Steve McConnell [1].

The only major disadvantage of the joint code review method, is an extremely high price: you need to gather several programmers at regular times to review a fresh code, or re-review a code after recommended changes have been applied. The programmers also need to rest regularly, as their attention might quickly weaken if they review large code fragments at a time, so there will be no use in code review then.

It appears that - on one hand - you want to review your code regularly. On the other hand, it is too expensive. Static code analysis tools are a good compromise. They can tirelessly handle the source texts of programs, and give recommendations to the programmer on what code fragments he/she should consider. Of course, a program can never replace a complete code review, performed by a team of programmers, but the ratio of use/price makes usage of static analysis a rather good practice which can be exploited by many companies.

The tasks solved by static code analysis software, can be divided into 3 categories:

  • Detecting errors in programs. We will speak about that in detail further on.
  • Recommendations on code formatting. Some static analyzers allow you to check if the source code corresponds to the code formatting standard accepted by your company. What we mean by this, is control of the number of indents in various constructs, use of spaces/tabs and so on.
  • Metrics computation. Software metrics, are a measure that let you get a numerical value of some property of software or its specifications. There are lots of various metrics which can be computed with the help of certain tools.

There are also other ways of using static code analysis tools. For instance, static analysis can be used as a method to control and teach new workers, who are not yet familiar enough with the company's programming rules.

There are a lot of commercial and free static code analyzers. The Wikipedia website contains a large list of static analyzers: List of tools for static code analysis. The list of languages static code analyzers support is quite large as well: C, C++, C#, Java, Ada, Fortran, Perl, Ruby, etc.

Like any other error detection methodology, static analysis has its strong and weak points. You should understand that there are no ideal software testing methods. Different methods will produce different results for different software classes. Only the combination of various methods will enable you to achieve the highest quality in your software.

The main advantage of static analysis is this: it enables you to greatly reduce the cost of eliminating defects in software. The earlier an error is detected, the lower the cost of fixing it. Thus, according to the data given in the book "Code Complete" by McConnell, fixing an error at the stage of testing costs ten times more than at the code writing stage:

Figure 1. The average cost of fixing defects depending on the time they have been made and detected (the data for the table is taken from the book "Code Complete" by S. McConnell).

Figure 1. The average cost of fixing defects depending on the time they have been made and detected (the data for the table is taken from the book "Code Complete" by S. McConnell).

Static analysis tools allow you to quickly detect a lot of errors at the coding stage, which significantly reduces the cost of development for the whole project. For example, the PVS-Studio static code analyzer can run in the background right after compilation is done, and tell the programmer about potential errors, if there are any (see incremental analysis mode).

Other static code analysis' advantages are as follows:

  • Full code coverage. Static analyzers even checks code fragments which get control very rarely. These code fragments usually cannot be tested through other methods. It allows you to find defects in exception handlers, or in the logging system.
  • Static analysis doesn't depend on the compiler you are using and the environment where the compiled program will be executed. It allows you to find hidden errors which may reveal themselves only a few years after they were created. For instance, undefined behavior errors. Such errors can occur when switching to another compiler version, or when using other code optimization switches. Another interesting example of hidden errors is discussed in the article "Overwriting memory - why?".
  • You can easily and quickly detect misprints and the consequences of Copy-Paste usage. Detecting these errors through other methods is usually extremely inefficient, and a waste of time and effort. It's a pity when you have spent an hour on debugging, just to find out that the error is in an expression of the "strcmp(A, A)"-kind. People usually don't remember such troubles when discussing typical errors. But practice shows that it takes a lot of time to detect them.

Static code analysis' disadvantages

  • Static analysis is usually poor regarding diagnosing memory leaks, and concurrency errors. In order to detect such errors, you actually need to execute a part of the program virtually. It is too difficult to implement. Such algorithms take too much memory and processor time. Static analyzers usually limit themselves to diagnosing simple cases. A more efficient way to detect memory leaks and concurrency errors, is to use dynamic analysis tools.
  • A static analysis tool warns you about odd fragments. This means that the code can actually be quite correct; we is call these 'false-positive' reports. Only the programmer can know if the analyzer points to a real error, or if it is just a false positive. The necessity to review false positives takes work time, and weakens attention to those code fragments which really do contain errors.

Errors detected by static analyzers are rather diverse. Here, for example, is the list of diagnostics implemented in the PVS-Studio tool. Some analyzers focus on a certain area, or certain types of defect, while others support certain coding standards, for instance, MISRA-C:1998, MISRA-C:2004, Sutter-Alexandrescu Rules, Meyers-Klaus Rules, etc.

The sphere of static analysis is actively developing; new diagnostic rules and standards appear regulary, while some rules become obsolete. This is why there is no sense in trying to compare analyzers on the basis of the defects they can detect. The only way to compare tools, is to check them on a set of projects, and count the number of real errors they have found. This subject is discussed in detail in the article "Difficulties of comparing code analyzers, or don't forget about usability".

Examples of errors detected by static code analysis

Myths about static analysis

References