The article is an interview with Konstantin Knizhnik taken by Andrey Karpov, "Program Verification Systems" company's worker. In this interview the issues of static code analysis, relevance of solutions made in this sphere and prospects of using static analysis while developing applications are discussed.
VivaMP support had been dropped in year 2014. If you have any questions, feel free to contact our support.
OOO "Program Verification Systems" developing tools in the sphere of program testing and verification asked Konstantin Knizhnik, a specialist in the sphere of static code analysis methodology, to answer some questions. The interview has been performed and presented in the form of this article by Andrey Karpov, OOO "Program Verification Systems" worker.
The interview touches upon issues of static code analysis and relevance of solutions made in this sphere. Prospects of using static analysis while developing parallel applications are also discussed. A side evaluation of the static analysis tools Viva64 and VivaMP developed by OOO "Program Verification Systems" is made. Besides, some common issues of program verification are discussed which, as we hope, will be interesting for the readers who explore this sphere of application testing.
Candidate of Physico-mathematical Sciences, Andrey Karpov - "Program Verification Systems" Company's technical director; develops the static code analysis tools Viva64 and VivaMP for testing 64-bit and parallel applications. The author of some articles on static code analysis.
Candidate of Physico-mathematical Sciences, Konstantin Knizhnik - the author of some articles devoted to static program code analysis, developer of Java-application verifiers; participated and continues to participate in many interesting projects, for example, in WebAlta.
I really had investigated the subject of static analysis and program verification. It began in 1997 when I wrote a small program jlint, a clint-analogue for Java.
The program consisted of two parts - a simplest static analyzer for languages with C-like syntax. As it is known, there are a lot of places in C language's syntax, which lead to errors difficult to detect, for example, "=" instead of "==", an empty loop body caused by a ";" put in a wrong place etc. I won't enumerate further for I suppose that the problems are rather familiar.
The second part - that was jlint - is a more or less self-dependent semantic analyzer for Java. I didn't want to get involved into writing my own Java-parser then, that's why I decided to read the already compiled code byte and analyze it. Tests include reference to zero link, incompatible casts, identically true or identically false expressions, accuracy loss and the like.
The most interesting feature in jlint was an ability to detect potential deadlocks in a program. Java has a very simple locking mechanism - a synchronized method or a synchronized(expr) construction. On the basis of analysis of these constructions an attempt was made to build the lock graph (where the nodes are the resources locked) and to find loops in this graph. Of course it is impossible to build a precise graph, that's why we tried to build an approximate graph using classes instead of concrete instances. Unfortunately, this testing didn't work well in real projects and there were a lot of false responses.
Some years later my jlint was noticed by the creator of TogetherSoft, the company which released the UML modeling tool and headed for developing a complete IDE for Java. And I began working in TogetherSoft. But at first I developed OODBMS, then Version Control System and only after that I took part in developing the Java-verifier.
For this time everything was serious: the complete syntactic parse, data flow analysis and other features were implemented. The number of audits was more than several hundreds. Some of them, of course, were rather simple, but some others claimed for rudiments of artificial intelligence. For example, the verifier detected errors of mixing up an index variable in embedded loops or performed detecting standard sequences for working with some resource (open, read/write, close) and searching for places where these sequences were broken.
In general, a lot was done, including rather original things. It's funny, for sure, when you launch the verifier on the source texts, for example, on JDK and get about a dozen of critical errors of reference to a zero address. In most cases it happens, of course, in the error handlers, that is in those places which are never executed in reality.
What is interesting, there were no Java-verifiers on market then. So, there had been opportunities for great success but somehow we didn't manage to use it.
TogetherSoft Company was sold to Borland Company. We had to include support of such languages as C++, Delphi and Visual Basic, and provide integration with JBuilder and Eclipse. At long last, our verifier did reach users but in a very poor form (generally because of the necessity to work on AST provided to us by other subsystems, which worked too slow and didn't contain the information necessary for the verifier). It was too late by that time for there were verifiers for nearly all popular IDE for Java. And although few of them tried to perform such a deep analysis, in most cases everything was reduced to doctoring the syntax, but these differences were not so easy to notice.
And then Borland Company was struck by the crisis and I have been working for several years already in an absolutely different sphere.
At present I am taking part in several projects at once. For example, in WebAlta Company my business is search engines. Besides WebAlta, I participate in creating OODBMS for plug-in systems. There are some more other projects.
As my work in TogetherSoft and then in Borland was directly connected with program verification, I had to give up my jlint. By the way, you can download the existing jlint from my site: http://www.garret.ru/lang.html.
Well, I'll try to briefly list the main conclusions I've made during the years devoted to the problem of program verification.
1. Most errors in real large projects are found by the most "stupid" audits. Take an example of absence of break in switch. It is very useful when such audits work in the development environment in interactive mode (thus they immediately mark the unsafe places).
2. Messages should be divided according to the confidence level (May-Be-Null, Is-Null, ...) with the possibility to turn off separate groups as well.
3. To perform a full analysis you need a symbolic calculator - to understand that i+1 > i. Of course, I am aware of the overflow due to which this condition is not always true but the verifier's task is to search for such suspicious places.
4. The worse a language is designed, the more work is there for the verifier - for example, C syntax causes a lot of programmer's errors and any C/C++ programmer has faced this problem more than once. In Java many of these defects were corrected, but still not all of them. Many our audits were busy trying to detect enums (which have been absent in Java until recently) according to different heuristic rules and providing them with something like type static control. Of course, all this turned out to be useless in C#.
5. The most important in the verifier is to maintain a reasonable balance between suspiciousness and "talkativeness". If I get several thousand messages on a small project, surely, I simply won't be able to check them all. So, we need division into criticality degrees. But if we take some rather critical error, for example, may-by-null which can be caused by code as follows:
if (x == null)
we must understand that having checked the first several suspicious places without finding an error in them, a programmer won't consider the remaining messages.
That's why the rules "It's better to say nothing than to say nonsense" and "if not sure, keep silent" are very topical for the verifier.
New languages (similar to Java and C#) with explicit memory release and absence of address arithmetic have made a program's behavior nearly determined and helped to get rid of millions of man-hours spent on program debugging ("where does memory escape", "who deletes this variable" etc) and also to get rid of tools like BoundsChecker whose task was to fight the abuse of C/C++ possibilities.
But unfortunately parallel programming - creation of multithread applications without which we cannot solve even a simple task nowadays - deprives us from this determinism and casts us to those times when a programmer had to spend too much time on debugging and launch tests for twenty-four hours in order not to get convinced of absence of errors (for a test can show only their presence but not to prove their absence) but mostly to clear his and the team leader's conscience.
Moreover, if earlier (and even now in C++) writing a parallel program demanded great efforts, in C#/Java it is much easier to create a new thread. This seeming simplicity creates an illusion that parallel programming is very simple, but unfortunately this is not so and as far as I know there are no parallelism models allowing you to do the same thing as "garbage collection" for usual programs (of course, if not to speak of merely functional languages where execution can be paralleled automatically without distracting a programmer's attention).
If we cannot solve a task on the level of proper language design, we have to provide support with static and dynamic verifiers. That's why I find the possibility of detecting deadlocks and race conditions in programs one of the most important tasks of the modern verifiers.
Thank you very much for the links to these articles, I liked them. I'll send these links to my colleagues. I think this information will be very useful for many of them, especially for young and inexperienced ones. When I worked in Digital Equipment I came across the first 64-bit systems (DEC Alpha). Projects were mainly connected with porting (from VMS/VAX on Digital Unix/Alpha and from RSX/PDP on OpenVMS/Alhpa). That's why we had faced ourselves all those problems of porting on a platform of a different capacity which you describe in your articles. The case was even more complicated because Alpha required strict data deskewing.
I haven't yet looked at Viva64 and VivaMP tools themselves but I promise that I will. But from my experience of working with the verifier in TogetherSoft/Borland I can say, or even warn, that, as in any commercial product, a verifier consists of nearly 10% of interesting ideas and algorithms and 90% of rather boring things without which, unfortunately, a commercial product cannot exist:
All in all, as everywhere, it turns out that it is rather easy to write some product (one person can do it in several months). But to turn it into a real commercial product you need much more efforts of not too creative and interesting character. And to be able to sell it and learn how to get money from that, you need quite a different talent.
Thank you. I was glad to talk to you too.
We would like to thank Konstantin for the interview once again and ask for permission to place this material in the Internet. We find many of his pieces of advice very useful and will certainly try to fulfill them in our static analysis program products.