The article describes the testing technologies used when developing PVS-Studio static code analyzer. The developers of the tool for programmers talk about the principles of testing their own program product which can be interesting for the developers of similar packages for processing text data or source code.
When developing and promoting PVS-Studio program product, we pay much attention to issues of software quality, development processes and principles of programmers' labor management. But we have never discussed the question how we ourselves develop our own program product. Do we use those technologies, recommendations and practices that we describe in our articles? After all, does the phrase "the shoemaker's wife is worst shod" apply to us?
In this article, we decided to tell you how we test our program product PVS-Studio. On one hand it is done to convince our (potential) users in the good quality of our tool. On the other hand, we want to share our successful experience of using these practices.
PVS-Studio is a static code analyzer intended for developers of modern resource-intensive C and C++ applications. By "modern applications" we understand 64-bit and/or parallel applications. Development of such programs involves some difficulties different from those for traditional programs. For besides common and familiar errors like uninitialized pointers detected by any compiler, there are specific types of problems.
We speak about errors in programs which occur when porting 32-bit applications to 64-bit platform. Or when paralleling code for multi-processor or multi-core support. It is rather difficult to develop such applications because of lack of tools simplifying the creation of 64-bit and parallel programs. PVS-Studio is such a tool.
When developing PVS-Studio we use six basic testing methods:
Let's briefly describe these methods.
As PVS-Studio is a static code analyzer, we use the method of static code analysis when developing it to search for issues in the analyzer itself. Firstly, we do not need to be reproached by the fact that we do not use our own product. And secondly, it actually helps us find errors before users find them.
Unit-tests at the level of classes, methods and functions allow us to make sure that adding a new functionality will not break the existing code. At this level, separate program entities are tested with the help of simple small tests. In other words, these are the most common and usual unit-tests.
Functional tests at the level of independently written files allow us to make sure that everything that must be diagnosed is still diagnosable. All the potential problems detected in the code must always be detected.
Functional tests at the level of separate files allow us to make sure that various separate files containing code are tested by the analyzer without any issues. By issues here we understand the initialization of ASSERT's, unexpected warnings about errors and crashes as well. For a code analyzer is just the same program as all the others and alas – crashes in it are rather frequent.
Functional tests at the level of separate third-party projects and solutions allow us to make sure that the analyzer can still test projects and the number of diagnostic warnings changes from version to version controllable and it is not chaotic.
Of course, besides these methods there are other common approaches like "a programmer's alert eyes" or "we are up to release tomorrow, check THIS", but they proved their disadvantages and we try no to use them.
User interface functional testing allows us to verify the operability of the extension plug-in used to integrate the analyzer into Visual Studio IDE. During these tests, not only the availability and expected functionality of individual UI elements are being verified, but also the expected interrelationships between these elements and the operation of the analyzer itself.
Now let's speak about the methods we use in detail.
Does the section's title puzzle you? What else can perform static code analysis but a static code analyzer? However, there are always some peculiarities arising when developing a tool for programmers.
As you perhaps know the first versions of programming languages compilers are rarely written in these very languages. As a rule, to develop a compiler of a new language an entirely different language is used that is a standard for the time. It is nowadays that all C++ compilers are written in C++, but the first version was written in C.
In the same way, when developing the first version of the static code analyzer, we could not test it. It is due to this that the first version of our product (it was called Viva64 then) was not 64-bit at all! But 1.10 version that appeared January 16, 2007 (that is, 17 days after release of the first version), contained the following line among other innovations:
Thus, as soon as we got a static analyzer detecting problems of 64-bit code, we started testing our own code with its help.
Do we benefit from static analysis of our own product? Of course, yes. But still we have some peculiar points due to the specific character of our task. For we write articles on how to make 64-bit code, we do not make 64-bit mistakes in the new code. But we benefit from using static analysis in the way of enhancing diagnosis. For example, we can examine what types of syntactic constructions obviously lead to a false response and exclude their diagnosis.
Thus, we benefit from using static analysis.
Unit-tests at the level of classes, methods and functions are a set of tests for checking separate logical elements of a program. Here are some examples of functionality we provide unit-tests for:
Of course our unit-tests cover not only these scopes - we just gave some examples.
How do we use these unit-tests? When correcting an error in the analyzer or adding a new functionality, unit-tests are launched in release-build mode. If the tests run without problems they are then launched in debug-build mode under the debugger. It is done to make sure that the ASSERTs the code abounds with are not initialized. Later, it will be clear why you should not launch the debug-version immediately.
Although such tests do allow us to detect errors, still they are not a completely adequate solution. The point is that we have too few unit-tests in PVS-Studio in comparison to the number of unit-tests needed to check "almost a compiler". That's why it is very difficult to provide wide coverage for unit-tests in our case. It is a very labor-intensive task that we will hardly manage.
However, unit-tests at the level of classes, methods and functions are the first "defense" layer in our testing system.
When developing a code analyzer, it is important not to "lose" diagnosis of those potential errors which have been detected since the beginning. Here we use functional tests at the level of independently written files. Absolutely all detected potential problems are gathered into separate files in the form of source code. These files are marked in a special way. The lines where the code analyzer must detect errors contain special mark symbols. And it is stated in these marks how many errors there should be in a particular line: one, two and so on. When something fails to be diagnosed we can see it at once. For example, earlier, a warning about two errors was generated in line 17, and now it is only one error. Or, vice versa, unnecessary warnings appeared.
This approach resembles in its working principle functional tests at the level of separate projects (we will speak about them further) but differs from them in high operation speed and the fact that the files being tested were written (and marked) independently.
Besides, this system for sure can be used when developing new diagnostic warnings. At first, you should manually write in the files the code where a new error will be diagnosed, then mark it and after that you may settle to implementation of the error's diagnosis itself. Until there is no implementation, the testing system will inform that a diagnosable error must be present here but it is absent. As soon as diagnosis is implemented the test runs correctly.
Testing of files from real projects (instead of separate classes/methods or files created manually) allows us to provide wider coverage of the code. It is the fourth layer in our testing system. Our tests include separate preprocessed files from various projects: wxWidgets, fox-toolit, CImg, Lame, Boost, CxImage, FreeType etc. They also include preprocessed files built on the basis of standard system header files (CRT, MFC etc).
After adding a new or correcting an existing functionality the programmer launches the release-version of the tests at first and then the debug-version. Why not launch the debug-version immediately? It's very simple - because the release-version of the tests runs for one minute and the debug-version for five minutes.
Testing at the level of separate files is a very powerful instrument. It allows you to instantly detect an error if some functionality has "fallen off" during the analyzer's development. A great amount of errors has been avoided thanks to these tests.
The most powerful tier of our testing system is its fifth layer which consists of testing for separate projects and solutions. It is thanks to this system that each new version of PVS-Studio is at least of the same quality as the previous.
This system operates as follows. There are dozens of projects and solutions of various programs available in the Internet. For example: Lame, Emule, Pixie, Loki etc. Each of these projects has been tested with PVS-Studio and the testing results have been saved (in the form of a PVS-Studio log-file). After installing a new version (a version being developed) we launch a system we have developed specially for this purpose, and it opens each project in turn, tests it with PVS-Studio, saves the results and then compares them to the reference ones. If there are differences it saves them into a separate file (an analogue of a standard diff) that you can easily view with the help of PVS-Studio and find out the reasons of these differences.
For example, in the new version of PVS-Studio, a new diagnostic warning with code V118 has appeared. We launch the testing system and it must inform us that V118 warnings appeared in some projects. Then we manually look through all the changes in the results and decide if warning V118 has been generated correctly or not.
But if besides warning V118 we see that some warnings V115 are missing, it means that the tests showed a disadvantage of the current version and it is sent back for improvement. In case of admitting all the changes as correct the new log-files are considered reference and the next time the new results will be compared to them.
This system has one more purpose. Because PVS-Studio product is intended for operation both in Visual Studio 2005 ,2008 and 2010, we always make sure that the warning messages coincide in the different versions of Visual Studio. That is, if, for example, we have got 10 000 diagnostic warnings in all the projects in Visual Studio 2005, we must have the same number of warnings in Visual Studio 2008.
How much time does such testing take? It is a difficult question - difficult because the base of these tests is constantly increasing because of the new projects. And of course the time spent also increases. A year ago, when the analyzer operated with only one core involved, the tests were performed in about an hour. Then we provided operation in several threads and the time of operation was reduced nearly twice for a two-core computer. With the lapse of time we had added more and more new projects into the tests. And now the tests run a bit more than an hour on the same two-core computer. Of course, it is all for the release-version.
The sixth tier contains the automated user interface testing system. Because the analyzer itself is a console application, it does not possess a user interface of its own. But the collection of verified files' compilation data and the creation of a large number of temporary configuration files are required for its operation and verification of these mentioned files, thus making the manual launch quite inconvenient. That's why, drawing the analogy from Visual Studio compiler, all these moments are hidden form the end user and are automated using the extension plug-in for Visual Studio IDE. And it is the GUI testing of the extension that is performed on the sixth tier of our system, i.e. in fact the verification is done for the additional functionality of the interface of Visual Studio IDE which is being provided by the integrated PVS-Studio itself.
To cover all functional capabilities of PVS-Studio the system utilizes 3 testing sets (one for each of the Visual Studio versions supported), each containing individual testing scenarios (use cases). Each one of these scenarios contains the verification of 3 to 5 test cases (a set of conditions which determine the satisfaction of the predefined condition). Each one of the testing sets contains identical use cases and identical requirements for each of the test cases because it is intended that the PVS-Studio analyzer should operate identically in every Visual Studio version supported.
The user interface testing system was implemented with the help of Visual Studio Tem Test embedded system for unit testing. (the extension allowing the automated testing for user interface was added in Visual Studio 2010) It is also worth noting that the user interfaces of Visual Studio versions 2005 and 2008 are identical for this system. (they possess an identical UI Mapping and are covered by the same implementation of testing scenarios) On the contrary, Visual Studio 2010 IDE possesses entirely new UI which is based mostly on WPF elements, requiring the separate implementation of these testing scenarios.
It is known that there is no limit for perfection. We are continuing to develop our testing system in every direction. Of course, we are constantly enlarging the base of the tests.
In this article you have learned how we test our static code analyzer PVS-Studio. Perhaps our experience will help you introduce such testing practices into your own working projects. And we hope that after reading about the process of testing PVS-Studio you would like to have a look at our tool.
Many programmers think that the more error messages a static code analyzer produces, the better. It would be true if all the messages hit the bull's eye, as they say. But this is impossible: the same warnings may be considered both true and false by different programmers depending on the project type. There is also one more important and interesting ...
I develop the PVS-Studio static code analyzer intended for analyzing C/C++ software. After we implemented general analysis in PVS-Studio 4.00, we received a lot of responses, both positive and negative. By the way, you are welcome to download a new version of PVS-Studio where we have fixed a lot of errors and defects thanks to users who told us about ...