Andrey Karpov

Nov 13 2015

Tags:

#CSharp

The First C# Project Analyzed

Nov 13 2015

Author: Andrey Karpov

Umbraco
About PVS-Studio 6.00
Why C#?
Interesting bugs found in the project
FAQ
Conclusion

The PVS-Studio team is now actively developing a static analyzer for C# code. The first version is expected by the end of 2015. And for now my task is to write a few articles to attract C# programmers' attention to our tool in advance. I've got an updated installer today, so we can now install PVS-Studio with C#-support enabled and even analyze some source code. Without further hesitation, I decided to scan whichever program I had at hand. This happened to be the Umbraco project. Of course we can't expect too much of the current version of the analyzer, but its functionality has been enough to allow me to write this small article.

Umbraco

Umbraco is an open-source content management system platform for publishing content on the World Wide Web and intranets. It is written in C#, and since version 4.5, the whole system has been available under an MIT License.

The project is of medium size, but its C# portion is fairly small, while most of the code is written in JavaScript. In all, the project consists of 3200 ".cs" files that make a total of 15 Mbytes. The number of C#-code lines is 400 KLOC.

About PVS-Studio 6.00

Analysis for this article was done using the alpha-version of PVS-Studio 6.00. The release will see two major changes:

Added C# support.
Disabled support for VS2005 and VS2008. The small quantity of our users who still work in these IDEs are suggested to continue using version 5.31 or next versions if they intend to do any bugfixing.

The pricing policy won't change. We are not making a new product; we are just extending the capabilities of the existing one by simply introducing support for one more programming language. Previously, you could use PVS-Studio to scan projects written in languages C, C++, C++/CLI, and C++/CX. Now you will get the option to analyze C# projects as well. This won't affect the price in any way. Those who have already purchased the tool to analyze C++ code will be able to analyze C# code too.

Why C#?

I would often argue at conferences that creating a C# analyzer didn't look like an interesting job. Lots of bugs peculiar to C++ are simply impossible in C#. And that's really so. For example, C# doesn't have such functions as memset(); therefore, it doesn't suffer from the tons of troubles related to it (see examples for memset(): V511, V512, V575, V579, V597, V598).

But I gradually changed my mind. You see, most of the bugs detected by PVS-Studio have to do with programmers' carelessness rather than language specifics. By carelessness I mean typos and poor modifications of copy-pasted code. This is what the PVS-Studio analyzer is really good at, and we thought that what had helped in C++ would also help in C#.

The C# language doesn't protect you from typing a wrong variable name or the "last line effect" which has to do with lack of attention.

Another important thing that prompted us to make a C# analyzer was the release of Roslyn. Without it, development would have been just too costly.

Roslyn is an open-source platform for analysis and compilation of C# and Visual Basic languages. Roslyn does two basic operations: it builds a syntax tree (parsing) and compiles it. In addition, it allows you to analyze the source code, recursively traverse it, handle Visual Studio projects, and execute the code at runtime.

Interesting bugs found in the project

For C++, my favorite diagnostic is V501. Now it has a counterpart in the C# module as well - V3001. Let's start with this one.

Code sample No.1

There is an attribute called "focalPoint":

[DataMember(Name = "focalPoint")]
public ImageCropFocalPoint FocalPoint { get; set; }

This attribute is of type 'ImageCropFocalPoint' which is defined as follows:

public class ImageCropFocalPoint
{
  [DataMember(Name = "left")]
  public decimal Left { get; set; }

  [DataMember(Name = "top")]
  public decimal Top { get; set; }
}

It's hard to make any mistake when working with an attribute like that, isn't it? Well, the author of that code made one - a sad typo in method HasFocalPoint():

public bool HasFocalPoint()
{
  return FocalPoint != null &&
   FocalPoint.Top != 0.5m && FocalPoint.Top != 0.5m;
}

'Top' is checked twice, while 'Left' is not checked at all.

PVS-Studio's diagnostic message: V3001 There are identical sub-expressions 'FocalPoint.Top != 0.5m' to the left and to the right of the '&&' operator. ImageCropDataSet.cs 58

Code sample No.2

protected virtual void OnBeforeNodeRender(ref XmlTree sender,
            ref XmlTreeNode node,
            EventArgs e)
{
  if (node != null && node != null)
  {
    if (BeforeNodeRender != null)
      BeforeNodeRender(ref sender, ref node, e);    
  }
}

PVS-Studio's diagnostic message: V3001 There are identical sub-expressions 'node != null' to the left and to the right of the '&&' operator. BaseTree.cs 503

The 'node' reference is checked twice. The 'sender' reference was probably meant to be checked too.

Code sample No.3

public void Set (ExifTag key, string value)
{
  if (items.ContainsKey (key))
    items.Remove (key);
  if (key == ExifTag.WindowsTitle ||   // <=
      key == ExifTag.WindowsTitle ||   // <=
      key == ExifTag.WindowsComment ||
      key == ExifTag.WindowsAuthor ||
      key == ExifTag.WindowsKeywords ||
      key == ExifTag.WindowsSubject) {
    items.Add (key, new WindowsByteString (key, value));
  ....
}

PVS-Studio's diagnostic message: V3001 There are identical sub-expressions 'key == ExifTag.WindowsTitle' to the left and to the right of the '||' operator. ExifPropertyCollection.cs 78

'key' is compared twice to the 'ExifTag.WindowsTitle' constant. I can't say for sure how serious this bug is. Perhaps one of the checks is just superfluous and can be removed. But it's also possible that the comparison should be done over some other variable.

Code sample No.4

Here's another example where I'm not sure if there is a real error. However, this code is still worth reviewing.

We have an enumeration with 4 named constants:

public enum DBTypes
{
  Integer,
  Date,
  Nvarchar,
  Ntext
}

For some reason, the SetProperty() method handles only 3 options. Again, I'm not saying this is a mistake. But the analyzer suggests reviewing this fragment and I totally agree with it.

public static Content SetProperty(....)
{
  ....
  switch (((DefaultData)property.PropertyType.
    DataTypeDefinition.DataType.Data).DatabaseType)
  {
    case DBTypes.Ntext:
    case DBTypes.Nvarchar:
      property.Value = preValue.Id.ToString();
      break;

    case DBTypes.Integer:
      property.Value = preValue.Id;
      break;
  }
  ....
}

PVS-Studio's diagnostic message: V3002 The switch statement does not cover all values of the 'DBTypes' enum: Date. ContentExtensions.cs 286

Code sample No.5

public TinyMCE(IData Data, string Configuration)
{
  ....
  if (p.Alias.StartsWith("."))
    styles += p.Text + "=" + p.Alias;
  else
    styles += p.Text + "=" + p.Alias;
  ....
}

PVS-Studio's diagnostic message: V3004 The 'then' statement is equivalent to the 'else' statement. TinyMCE.cs 170

Code sample No.6, No.7

At the beginning of the article, I said that C# doesn't protect you from the "last line effect". Here's an example to prove that:

public void SavePassword(IMember member, string password)
{
  ....
  member.RawPasswordValue = result.RawPasswordValue;
  member.LastPasswordChangeDate = result.LastPasswordChangeDate;
  member.UpdateDate = member.UpdateDate;
}

PVS-Studio's diagnostic message: V3005 The 'member.UpdateDate' variable is assigned to itself. MemberService.cs 114

The programmer was copying class members from the object 'result' to 'member'. But at the end (s)he relaxed and unwittingly copied the member 'member.UpdateDate' into itself.

Another thing that makes me feel suspicious about this code is that the method SavePassword() deals with passwords, and it means that one must be especially careful about it.

The same code fragment can be found in file UserService.cs (see line 269). My guess is that the programmer simply copied it there without checking.

Code sample No.8

private bool ConvertPropertyValueByDataType(....)
{
  if (string.IsNullOrEmpty(string.Format("{0}", result)))
  {
    result = false;
    return true;
  }
  ....
    return true;
  ....
    return true;
  ....
    return true;
  ....
    return true;
  ....
  ....
  return true;
}

PVS-Studio's diagnostic message: V3009 It's odd that this method always returns one and the same value of 'true'. DynamicNode.cs 695

The method uses lots of 'if' and 'return' statements. What doesn't look right to me is that all the 'return' statements return 'true'. Isn't there a bug somewhere? What if some of those should return 'false'?

Code sample No.9

Now let's test your attentiveness: try to find a bug in the code fragment below. Just examine the method but don't read my explanation after it. To prevent you from accidentally reading it, I inserted a separator (a unicorn image :).

public static string GetTreePathFromFilePath(string filePath)
{
  List<string> treePath = new List<string>();
  treePath.Add("-1");
  treePath.Add("init");
  string[] pathPaths = filePath.Split('/');
  pathPaths.Reverse();
  for (int p = 0; p < pathPaths.Length; p++)
  {
    treePath.Add(
      string.Join("/", pathPaths.Take(p + 1).ToArray()));
  }
  string sPath = string.Join(",", treePath.ToArray());
  return sPath;
}

Figure 1. Separating code from explanation.

PVS-Studio's diagnostic message: V3010 The return value of function 'Reverse' is required to be utilized. DeepLink.cs 19

When calling the Reverse() method, the programmer intended to change the array 'pathPaths'. (S)he was probably mislead by the fact that an operation like that is totally correct when we deal with lists (List<T>.Reverse). But when applied to arrays, the Reverse() method doesn't change the original array. To work with arrays, this method is implemented through extension method Reverse() of the 'Enumerable' class and returns a modified collection rather than reverse the items directly.

A correct way of doing that would be like this:

string[] pathPaths = filePath.Split('/');
pathPaths = pathPaths.Reverse().ToArray();

Or even like this:

string[] pathPaths = filePath.Split('/').Reverse().ToArray();

Code sample No.10

The PVS-Studio analyzer output a few V3013 warnings reporting some methods whose bodies looked strangely alike. To my mind, all of those are false positives. Only one of the warnings is probably worth checking out:

public void GetAbsolutePathDecoded(string input, string expected)
{
    var source = new Uri(input, UriKind.RelativeOrAbsolute);
    var output = source.GetSafeAbsolutePathDecoded();
    Assert.AreEqual(expected, output);
}
public void GetSafeAbsolutePathDecoded(string input, string expected)
{
    var source = new Uri(input, UriKind.RelativeOrAbsolute);
    var output = source.GetSafeAbsolutePathDecoded();
    Assert.AreEqual(expected, output);
}

PVS-Studio's diagnostic message: V3013 It is odd that the body of 'GetAbsolutePathDecoded' function is fully equivalent to the body of 'GetSafeAbsolutePathDecoded' function. UriExtensionsTests.cs 141

Inside the GetAbsolutePathDecoded() method, we may need to use

source. GetAbsolutePathDecoded()

instead of

source.GetSafeAbsolutePathDecoded()

I'm not sure about that, but this spot should be inspected.

FAQ

The article is meant for a new audience, so I anticipate a number of questions people may want to ask. I'll try to answer these questions in advance.

Did you report the bugs you'd found to the project developers?

Yes, we try to do it all the time.

Do you run PVS-Studio on itself?

Yes.

Does PVS-Studio support Mono?

No.

For more detailed answers to these and other questions see the post "Readers' FAQ on Articles about PVS-Studio".

Conclusion

There aren't many bugs in this project. Our C++-oriented readers know why it happens so, but since we yet have to charm and lure C# programmers into our camp, I'll clarify some important points here:

A static analyzer is a tool meant for regular use. Its purpose is to find bugs at the earliest development stage. Running it on occasions doesn't make much sense because using it this way only helps detect non-critical bugs or bugs in rarely executed code. The reason is that between these runs, the real bugs are being fixed through enormous effort. They are found by programmers who then spend hours debugging the code; they are spotted by testers; or, what's worst of all, they are reported by users. Many of these bugs could be found and fixed right off if you used the analyzer regularly. So treat PVS-Studio as an extension to the C#-compiler's warnings. Hopefully you don't check the list of compiler warnings once a year, do you? All this stuff is discussed in more detail in the article "Leo Tolstoy and static code analysis".
In our articles, we only mention those code fragments we find interesting and worth telling about. We generally don't discuss cases when the analyzer sincerely suspects a bug in some code while it is actually clean. We call such code "smelling code". When using PVS-Studio, you'd better review such fragments. But discussing them in articles is beside the point.
We don't have this item for the C++ part of the analyzer, but it is relevant for C#. There are just a few diagnostics implemented for this module so far, but we are rapidly advancing. Just let our C#-unicorn grow a bit - and then it will show you how cool it is!

Thank you for reading this article, and may your programs stay bugless!

#CSharp