- Discovering static analysis
- The biggest mistake you can make when adopting static analysis
- A fallacy: "Efficiency of static analysis can be estimated by comparing analysis results for the last year's code base release and the current one"
- Tips on how to obtain the maximum benefit when adopting static analysis
- The more practices you follow, the greater effect you get
Some of our users run static analysis only occasionally. They find new errors in their code and, feeling glad about this, willingly renew PVS-Studio licenses. I should feel glad too, shouldn't I? But I feel sad - because you get only 10-20% of the tool's efficiency when using it in such a way, while you could obtain at least 80-90% if you used it otherwise. In this post I will tell you about the most common mistake among users of static code analysis tools.
Let's at first discuss the simplest scenario which is the most common with those who try the static code analysis technology for the first time. Some member of a developer team once came across an article, or a conference lecture, or even an advertisement of some static analyzer and decided to try it on his/her project. I'm not telling about PVS-Studio in particular - it might be any static analysis tool. Now the programmer, more or less easily, deploys a static analysis system and launches code analysis. The following consequences are possible:
- The tool fails to work. It doesn't pick up the project settings, or gathers the environment parameters incorrectly, or fails in any other way. The user naturally grows less inclined to trust it.
- The tool performs the check successfully and generates some diagnostic messages. The user studies them and finds them irrelevant. It doesn't mean that the tool is absolutely poor; it just has failed to demonstrate its strong points on this particular project. Perhaps it should be given a chance with another one.
- The tool generates a few relevant messages (among others) which obviously indicate that genuine bugs are present in the code.
Strictly speaking, it's in the third case, when something real is found, that the team starts using a tool in practice.
But one may make a great mistake at this point when integrating static analysis into the development process: namely, you may accept a practice to run static analysis before every release, for instance. Or run it once a month. Let's see why such an approach is bad:
First, since you don't use the false positive suppression mechanism, you will see the same old false positives again and again. And therefore you will have to waste time to investigate them. The more messages the analyzer generates, the less focused the programmer is.
Second, diagnostic messages are generated even for the code which you didn't touch between the checks. This means you'll get even more messages to examine.
Third, and most important, with such an approach you won't get the static analyzer to find all those errors you were catching through other methods for so long and with so much sadness between two checks. This thing is very important, and I want to discuss it in detail. It should be done also because it is the thing people forget about when estimating the usefulness of static analysis. See the next section.
A fallacy: "Efficiency of static analysis can be estimated by comparing analysis results for the last year's code base release and the current one"
Some programmers suggest using this method to estimate efficiency of static code analysis. Imagine a team working on some project for several years. It keeps all the project release versions (1.0, 1.1., 1.2, etc.). It is suggested that they get the latest version of some code analyzer and run it on the last year's project source codes - say, version 1.3. Then the same version of the static analyzer should be run on the latest code base release - let it be version 1.7. After that we get two reports by the analyzer. We study the first report to find out that the older project contains 50 genuine errors. Then we study the second report and see that the latest project contains 20 bugs out of those 50 ones (and some new ones, of course). It means that 50-20 = 30 bugs have been fixed through alternative methods without using the analyzer. These errors could have been found, for instance, through manual testing, or by users when working with the release version, or otherwise. We draw a conclusion that the static analyzer could have helped to quickly detect and fix 30 errors. If this number is pretty large for the project and developers' time is expensive, we may estimate economic efficiency of purchasing and using the static code analyzer.
This approach to economic efficiency estimation is absolutely incorrect! You cannot use it to evaluate a static analyzer! The reason is that you make several errors at once when trying to do that.
First, you don't take into account those bugs which have been added into version 1.4 of the code base and eliminated in version 1.6. You may argue: "Then we should compare two releases in succession, for example 1.4 and 1.5!". But it is wrong too, since you don't take account of errors which appeared after release 1.4 and were fixed before release 1.5.
Second, code base release versions are in themselves already debugged and contain few bugs - unlike the current version the developers are working on. I believe the release wasn't as buggy and crash-prone, was it? You surely fix bugs between releases, but you detect them through other methods which are naturally more expensive.
Here we should remind you of the table demonstrating the dependency of the cost of bug fixes on the time they were added into the code and detected. You should understand that running static analysis only "before a release" automatically increases the cost of bug fixes.
Thus, you cannot truly evaluate efficiency of static analysis by simply running it on the last year's code base release and the current one and comparing the results.
During the time we have been working in the field of static code analysis, we have worked out several practices to obtain the maximum benefit from using static analysis tools. Although I will specify how these mechanisms are supported in PVS-Studio, the tips can be used with any other static analysis tool.
Any static code analyzer generates false positives. This is the nature of the tool and it can't be helped. Of course, everybody tries to reduce the number of false positives, but you can't make it zero. In order not to get the same false positives again and again, a good tool provides you with a mechanism to suppress them. You can simply mark a message with "false positive" and thus tell the analyzer not to generate it at the next check. In PVS-Studio, it is the "Mark As False Alarm" function responsible for this. See the documentation section Suppression of false alarms for details.
Despite being very simple, this recommendation may help you to save much time. Moreover, you will stay more focused when you have fewer messages to examine.
The effective way of handling incremental analysis is to integrate it into IDE for the tool to be able to be launched automatically when compiling freshly modified files. Ideally, incremental analysis should be run by all the developers currently working on the code base on their computers. In this case, many bugs will be detected and fixed before the code gets into the version control system. This practice greatly reduces the "cost of bug fixes". PVS-Studio supports the incremental analysis mode.
If you for some reason cannot install the analyzer on all the developers' computers, you may check the code once in several days. In order not to get a pile of messages referring to old files, static analysis tools provide the option "check files modified in the last N days". You can set it to 3 days, for example. Although from the technical viewpoint nothing prevents you from setting this parameter to any number (say, 7 or 10 days), we don't recommend you to do that. When you check the code just once in 10 days, you repeat the "occasional use of the tool" mistake. You see, if a bug is added today, found by testers tomorrow, described in the bug-tracker the day after tomorrow, and fixed in 5 days, running analysis once in 10 days will be useless.
But the capability of checking the code once in two or three days may appear very useful. PVS-Studio supports this option - see the settings command "Check only Files Modified In".
Regardless of whether or not you use incremental analysis on every developer's computer, a very useful practice is to perform a complete run of the analyzer on the whole code base every night. A highly important capability of the tool is therefore the capability of command line launch, which is of course supported in PVS-Studio.
Let's once again enumerate the tips on how to enhance efficiency of static code analysis:
- Mark false positives to get fewer messages to study the next time.
- Use incremental analysis (automated check of freshly recompiled files).
- Check files modified in the last several days.
- Set the static analyzer to run every night on the build server.
If you follow all the four recommendations, you'll get the highest payback from investing into static analysis tools. Of course, it sometimes cannot be achieved due to various reasons, but you should certainly strive for it.