Sergey Vasiliev

Feb 12 2021

Tags:

#CSharp #Knowledge

Should we initialize an out parameter before a method returns?

Feb 12 2021

Author: Sergey Vasiliev

Back story
The out parameter modifier
Exploring Roslyn
Let's recap
Conclusion

Surely every C# developer has used out-parameters. It seems that everything is extremely simple and clear with them. But is it really so? For a kickoff, let's start with a self-test task.

Let me remind you that out parameters must be initialized by the called method before exiting it.

Now look at the following code snippet and see if it compiles.

void CheckYourself(out MyStruct obj)
{
  // Do nothing
}

MyStruct - a value type:

public struct MyStruct
{ .... }

If you confidently answered "yes" or "no" - I invite you to keep reading, since everything is not so clear...

Back story

Let's start with a quick flash back. How did we even dive into the study of out parameters?

It all started with the development of another diagnostic rule for PVS-Studio. The idea of the diagnostic is as follows - one of the method parameters is of the CancellationToken type. This parameter is not used in the method body. As a result, the program may not respond (or react untimely) to some cancellation actions, such as canceling an operation by the user's request. When viewing warnings of the diagnostic, we found code that looks something like this:

void Foo(out CancellationToken ct, ....)
{
  ....
  if (flag)
    ct = someValue;
  else
    ct = otherValue;
  ....
}

Obviously, this was a false positive, so I asked a colleague to add another unit test "with out parameters". He added tests, including a test of this type:

void TestN(out CancellationToken ct)
{
  Console.WriteLine("....");
}

First of all, I was interested in tests with parameter initializations, but I took a closer look at this... And then it hit me! How does this code actually compile? Does it compile at all? The code was compiling. Then I realized I got an article coming up. :)

For the sake of experiment, we decided to change the CancellationToken to some other value type. For example, TimeSpan:

void TestN(out TimeSpan timeSpan)
{
  Console.WriteLine("....");
}

It does not compile. Well, that's to be expected. But why did the example with CancellationToken compile?

The out parameter modifier

Let's recall again what is a parameter's out modifier. Here are the main theses taken from docs.microsoft.com (out parameter modifier):

The out keyword causes arguments to be passed by reference;
Variables passed as out arguments do not have to be initialized before being passed in a method call. However, the called method is required to assign a value before the method returns.

Please pay attention to the highlighted sentence.

Here is the question. What is the difference between the following three methods, and why does the last one compile, while the first and second do not?

void Method1(out String obj) // compilation error
{ }

void Method2(out TimeSpan obj) // compilation error
{ }

void Method3(out CancellationToken obj) // no compilation error
{ }

So far, the pattern is not obvious. Maybe there are some exceptions that are described in the docks? For the CancellationToken type, for example. Although that would be a bit strange - what's so special about it? In the above documentation, I did not find any information about this. Here's what the documentation suggests: For more information, see the C# Language Specification. The language specification is the definitive source for C# syntax and usage.

Well, let's see the specification. We are interested in the "Output parameters" section. Nothing new - it is all the same: Every output parameter of a method must be definitively assigned before the method returns.

Well, since the official documentation and specification of the language did not give us answers, we will have to dig into the compiler. :)

Exploring Roslyn

You can download the Roslyn source code from the project page on GitHub. For experiments, I took the master branch. We will work with the Compilers.sln solution. As a starting project for experiments, we use csc.csproj. You can even run it on a file with our tests to make sure that the problem is reproducible.

For the experiments we will use the following code:

struct MyStruct
{
  String _field;
}

void CheckYourself(out MyStruct obj)
{
  // Do nothing
}

To check that the error really takes place, we will build and run the compiler on the file with this code. And indeed - the error is right there: error CS0177: The out parameter 'obj' must be assigned to before control leaves the current method

By the way, this message can be a good starting point for diving into the code. The error code itself (CS0177) is probably generated dynamically, whereas the format string for the message is most likely somewhere in the resources. And this is true - we find the ERR_ParamUnassigned resource:

<data name="ERR_ParamUnassigned" xml:space="preserve">
  <value>The out parameter '{0}' must be assigned to 
         before control leaves the current method</value>
</data>

By the same name, we find the error code - ERR_ParamUnassigned = 177, as well as several places of use in the code. We are interested in the place where the error is added (the DefiniteAssignmentPass.ReportUnassignedOutParameter method):

protected virtual void ReportUnassignedOutParameter(
  ParameterSymbol parameter, 
  SyntaxNode node, 
  Location location)
{
  ....
  bool reported = false;
  if (parameter.IsThis)
  {
    ....
  }

  if (!reported)
  {
    Debug.Assert(!parameter.IsThis);
    Diagnostics.Add(ErrorCode.ERR_ParamUnassigned, // <=
                    location, 
                    parameter.Name);
  }
}

Well, that seems like the place we're interested in! We set a breakpoint and make sure that this fragment is what we need. According to the results, Diagnostics will record exactly the message that we saw:

Well, that's great. And now let's change MyStruct to CancellationToken, aaand... We still enter this code execution branch, and the error is recorded in Diagnostics. This means it's still there! That's a twist!

Therefore, it is not enough to track the place where the compilation error is added - we have to explore it further.

After some digging in the code, we go to the DefiniteAssignmentPass.Analyze method that initiated the analysis run. The method checks, among other things, that the out parameters get initialized. In it, we find that the corresponding analysis runs 2 times:

// Run the strongest version of analysis
DiagnosticBag strictDiagnostics = analyze(strictAnalysis: true);
....
// Also run the compat (weaker) version of analysis to see 
   if we get the same diagnostics.
// If any are missing, the extra ones from the strong analysis 
   will be downgraded to a warning.
DiagnosticBag compatDiagnostics = analyze(strictAnalysis: false);

There is an interesting condition below:

// If the compat diagnostics did not overflow and we have the same 
   number of diagnostics, we just report the stricter set.
// It is OK if the strict analysis had an overflow here,
   causing the sets to be incomparable: the reported diagnostics will
// include the error reporting that fact.
if (strictDiagnostics.Count == compatDiagnostics.Count)
{
  diagnostics.AddRangeAndFree(strictDiagnostics);
  compatDiagnostics.Free();
  return;
}

The case is gradually becoming clearer. We are trying to compile our code with MyStruct. After strict and compat analysis we still get the same number of diagnostics that will be issued.

If we change MyStruct to CancellationToken in our example, strictDiagnostics will contain 1 error (as we have already seen), and compatDiagnostics will have nothing.

As a result, the above condition is not met and the method execution is not interrupted. Where does the compilation error go? It turns out to be a simple warning:

HashSet<Diagnostic> compatDiagnosticSet 
  = new HashSet<Diagnostic>(compatDiagnostics.AsEnumerable(), 
                            SameDiagnosticComparer.Instance);
compatDiagnostics.Free();
foreach (var diagnostic in strictDiagnostics.AsEnumerable())
{
  // If it is a warning (e.g. WRN_AsyncLacksAwaits), 
     or an error that would be reported by the compatible analysis, 
     just report it.
  if (   diagnostic.Severity != DiagnosticSeverity.Error 
      || compatDiagnosticSet.Contains(diagnostic))
  {
    diagnostics.Add(diagnostic);
    continue;
  }

  // Otherwise downgrade the error to a warning.
  ErrorCode oldCode = (ErrorCode)diagnostic.Code;
  ErrorCode newCode = oldCode switch
  {
#pragma warning disable format
    ErrorCode.ERR_UnassignedThisAutoProperty 
      => ErrorCode.WRN_UnassignedThisAutoProperty,
    ErrorCode.ERR_UnassignedThis             
      => ErrorCode.WRN_UnassignedThis,
    ErrorCode.ERR_ParamUnassigned                   // <=      
      => ErrorCode.WRN_ParamUnassigned,
    ErrorCode.ERR_UseDefViolationProperty    
      => ErrorCode.WRN_UseDefViolationProperty,
    ErrorCode.ERR_UseDefViolationField       
      => ErrorCode.WRN_UseDefViolationField,
    ErrorCode.ERR_UseDefViolationThis        
      => ErrorCode.WRN_UseDefViolationThis,
    ErrorCode.ERR_UseDefViolationOut         
      => ErrorCode.WRN_UseDefViolationOut,
    ErrorCode.ERR_UseDefViolation            
      => ErrorCode.WRN_UseDefViolation,
    _ => oldCode, // rare but possible, e.g. 
                     ErrorCode.ERR_InsufficientStack occurring in 
                     strict mode only due to needing extra frames
#pragma warning restore format
  };

  ....
  var args 
     = diagnostic is DiagnosticWithInfo { 
         Info: { Arguments: var arguments } 
       } 
       ? arguments 
       : diagnostic.Arguments.ToArray();
  diagnostics.Add(newCode, diagnostic.Location, args);
}

What happens in our case when using CancellationToken? The loop traverses strictDiagnostics. Let me quickly remind you that it contains an error - an uninitialized out parameter. Then branch of the if statement is not executed. It is because diagnostic.Severity is of DiagnosticSeverity.Error value, and the compatDiagnosticSet collection is empty. Then compilation error code is mapped with a new code - a warning's one. After, the warning is formed and written to the resulting collection. This is how the compilation error turned into a warning. :)

By the way, it has a fairly low level. So when you run the compiler, this warning may not be visible if you do not set the flag for issuing warnings of the appropriate level.

Let's run the compiler and specify an additional flag: csc.exe %pathToFile% -w:5

And we see the expected warning:

Now we have figured out where the compilation error disappears - it is replaced with a low-priority warning. However, we still do not have an answer to the question, what is the distinctiveness of CancellationToken and its difference from MyStruct? When analyzing the method with a MyStruct out parameter, compat analysis finds an error. Whereas when the parameter type is CancellationToken, the error can't be detected. Why is it so?

Here I suggest grabbing a cup of tea or coffee, because we are about to get down to a painstaking investigation.

I hope you took the advice and got ready. So let's move on. :)

Remember the ReportUnassignedParameter method in which the compilation error was written? Let's look at the calling method above:

protected override void LeaveParameter(ParameterSymbol parameter, 
                                       SyntaxNode syntax, 
                                       Location location)
{
  if (parameter.RefKind != RefKind.None)
  {
    var slot = VariableSlot(parameter);
    if (slot > 0 && !this.State.IsAssigned(slot))
    {
      ReportUnassignedOutParameter(parameter, syntax, location);
    }

    NoteRead(parameter);
  }
}

The difference when executing these methods from strict and compat analysis is that in the first case, the slot variable has the value 1, and in the second - -1. Therefore, in the second case, the then branch of the if statement is not executed. Now we need to find out why slot has the value -1 in the second case.

Look at the method LocalDataFlowPass.VariableSlot:

protected int VariableSlot(Symbol symbol, int containingSlot = 0)
{
  containingSlot = DescendThroughTupleRestFields(
                     ref symbol, 
                     containingSlot,                                   
                     forceContainingSlotsToExist: false);

  int slot;
  return 
    (_variableSlot.TryGetValue(new VariableIdentifier(symbol, 
                                                      containingSlot), 
                               out slot)) 
    ? slot 
    : -1;
}

In our case, _variableSlot does not contain a slot for the out parameter. Therefore, _variableSlot.TryGetValue(....) returns false. The code execution follows the alternative branch of the ?:, operator, and the method returns -1. Now we need to understand why _variableSlot does not contain an out parameter.

After digging around, we find the LocalDataFlowPass.GetOrCreateSlot method. It looks like this:

protected virtual int GetOrCreateSlot(
  Symbol symbol, 
  int containingSlot = 0, 
  bool forceSlotEvenIfEmpty = false, 
  bool createIfMissing = true)
{
  Debug.Assert(containingSlot >= 0);
  Debug.Assert(symbol != null);

  if (symbol.Kind == SymbolKind.RangeVariable) return -1;

  containingSlot 
    = DescendThroughTupleRestFields(
        ref symbol, 
        containingSlot,
        forceContainingSlotsToExist: true);

  if (containingSlot < 0)
  {
    // Error case. Diagnostics should already have been produced.
    return -1;
  }

  VariableIdentifier identifier 
    = new VariableIdentifier(symbol, containingSlot);
  int slot;

  // Since analysis may proceed in multiple passes, 
     it is possible the slot is already assigned.
  if (!_variableSlot.TryGetValue(identifier, out slot))
  {
    if (!createIfMissing)
    {
      return -1;
    }

    var variableType = symbol.GetTypeOrReturnType().Type;
    if (!forceSlotEvenIfEmpty && IsEmptyStructType(variableType))
    {
      return -1;
    }

    if (   _maxSlotDepth > 0 
        && GetSlotDepth(containingSlot) >= _maxSlotDepth)
    {
      return -1;
    }

    slot = nextVariableSlot++;
    _variableSlot.Add(identifier, slot);
    if (slot >= variableBySlot.Length)
    {
      Array.Resize(ref this.variableBySlot, slot * 2);
    }

    variableBySlot[slot] = identifier;
  }

  if (IsConditionalState)
  {
    Normalize(ref this.StateWhenTrue);
    Normalize(ref this.StateWhenFalse);
  }
  else
  {
    Normalize(ref this.State);
  }

  return slot;
}

The method shows that there is a number of conditions when the method returns -1, and the slot will not be added to _variableSlot. If there is no slot for a variable yet, and all checks are successful, then an entry is made in _variableSlot: _variableSlot.Add(identifier, slot). We debug the code and see that when performing strict analysis, all checks pass successfully. Whereas when performing compat analysis, we finish executing the method in the following if statement:

var variableType = symbol.GetTypeOrReturnType().Type;
if (!forceSlotEvenIfEmpty && IsEmptyStructType(variableType))
{
  return -1;
}

The value of the forceSlotEvenIfEmpty variable is false in both cases. The difference is in the value of the IsEmptyStructType method: for strict analysis it is false, for compat analysis – true.

At this point I already have new questions and the desire to do some experiments. So it turns out that if the type of the out parameter is an "empty structure" (later we will get what this means), the compiler considers such code valid and does not generate an error, right? In our example, we remove the field from MyStruct and compile it.

struct MyStruct
{  }

void CheckYourself(out MyStruct obj)
{
  // Do nothing
}

And this code compiles successfully! Interesting... I can't recall any mention of such features in the documentation and specification. :)

Here comes another question: how does the code work when the type of the out parameter is CancellationToken? After all, this is clearly not an "empty structure". If you check out the code at referencesource.microsoft.com (link to CancellationToken), it becomes clear that this type contains methods, properties, and fields... Still not clear, let's keep digging.

Let's go back to the LocalDataFlowPass.IsEmptyStructType method:

protected virtual bool IsEmptyStructType(TypeSymbol type)
{
  return _emptyStructTypeCache.IsEmptyStructType(type);
}

Let's go deep (EmptyStructTypeCache.IsEmptyStructType):

public virtual bool IsEmptyStructType(TypeSymbol type)
{
  return IsEmptyStructType(type, ConsList<NamedTypeSymbol>.Empty);
}

And even deeper:

private bool IsEmptyStructType(
  TypeSymbol type, 
  ConsList<NamedTypeSymbol> typesWithMembersOfThisType)
{
  var nts = type as NamedTypeSymbol;
  if ((object)nts == null || !IsTrackableStructType(nts))
  {
    return false;
  }

  // Consult the cache.
  bool result;
  if (Cache.TryGetValue(nts, out result))
  {
    return result;
  }

  result = CheckStruct(typesWithMembersOfThisType, nts);
  Debug.Assert(!Cache.ContainsKey(nts) || Cache[nts] == result);
  Cache[nts] = result;

  return result;
}

The code is executed by calling the EmptyStructTypeCache.CheckStruct method:

private bool CheckStruct(
  ConsList<NamedTypeSymbol> typesWithMembersOfThisType, 
  NamedTypeSymbol nts)
{
  .... 
  if (!typesWithMembersOfThisType.ContainsReference(nts))
  {
    ....
    typesWithMembersOfThisType 
      = new ConsList<NamedTypeSymbol>(nts, 
                                      typesWithMembersOfThisType);
    return CheckStructInstanceFields(typesWithMembersOfThisType, nts);
  }

  return true;
}

Here, the execution goes into then branch of the if statement, as the typesWithMembersOfThisType collection is empty. Check out the EmptyStructTypeCache.IsEmptyStructType method, where it is passed as an argument.

We're getting some clarity here - now we understand what is an "empty structure". Judging by the methods' names, this is a structure that does not contain instance fields. But let me remind you that there are instance fields in CancellationToken. So, we go the extra mile and check out the EmptyStructTypeCache.CheckStructInstanceFields method.

private bool CheckStructInstanceFields(
  ConsList<NamedTypeSymbol> typesWithMembersOfThisType, 
  NamedTypeSymbol type)
{
  ....
  foreach (var member in type.OriginalDefinition
                             .GetMembersUnordered())
  {
    if (member.IsStatic)
    {
      continue;
    }
    var field = GetActualField(member, type);
    if ((object)field != null)
    {
      var actualFieldType = field.Type;
      if (!IsEmptyStructType(actualFieldType, 
                             typesWithMembersOfThisType))
      {
        return false;
      }
    }
  }

  return true;
}

The method iterates over instance members. We get 'actualField' for each of them. We managed to get this value (field - not null) and next we check if the type of this field is an "empty structure". This means if we find at least one "non-empty structure", we also consider the original type to be a "non-empty structure". If all the instance fields are "empty structures", then the original type is also considered an "empty structure".

We'll have to go a little deeper. Don't worry, our dive will be over soon, and we'll put the dots on the 'i'. :)

Look at the method EmptyStructTypeCache.GetActualField:

private FieldSymbol GetActualField(Symbol member, NamedTypeSymbol type)
{
  switch (member.Kind)
  {
    case SymbolKind.Field:
      var field = (FieldSymbol)member;
      ....
      if (field.IsVirtualTupleField)
      {
        return null;
      }

      return (field.IsFixedSizeBuffer || 
              ShouldIgnoreStructField(field, field.Type)) 
            ? null 
            : field.AsMember(type);

      case SymbolKind.Event:
        var eventSymbol = (EventSymbol)member;
        return (!eventSymbol.HasAssociatedField || 
               ShouldIgnoreStructField(eventSymbol, eventSymbol.Type)) 
             ? null 
             : eventSymbol.AssociatedField.AsMember(type);
  }

  return null;
}

Accordingly, for the CancellationToken type, we are interested in the SymbolKind.Field case-branch. We can only get into it when analyzing the m_source member of this type. It is because the CancellationToken type contains only one instance field – m_source).

Let's look at calculations in this case (branch in our case).

field.IsVirtualTupleField - false. We move on to the conditional operator and parse the conditional expression field.IsFixedSizeBuffer || ShouldIgnoreStructField(field, field.Type). field.IsFixedSizeBuffer is not our case. As expected the value is false. As for the value returned by calling the ShouldIgnoreStructField(field, field.Type) method, it differs for strict and compat analysis. A quick reminder – we analyze the same field of the same type.

Here is the body of the EmptyStructTypeCache.ShouldIgnoreStructField method:

private bool ShouldIgnoreStructField(Symbol member, 
                                     TypeSymbol memberType)
{
  // when we're trying to be compatible with the native compiler, we 
     ignore imported fields (an added module is imported)
     of reference type (but not type parameters, 
     looking through arrays)
     that are inaccessible to our assembly.

  return _dev12CompilerCompatibility &&                             
         ((object)member.ContainingAssembly != _sourceAssembly ||   
          member.ContainingModule.Ordinal != 0) &&                      
         IsIgnorableType(memberType) &&                                 
         !IsAccessibleInAssembly(member, _sourceAssembly);          
}

Let's see what is different for strict and compat analysis. Well, you may have already guessed on your own. :)

Strict analysis:_dev12CompilerCompatibility – false, hence the result of the entire expression is false. Compat analysis: the values of all subexpressions are true; the result of the entire expression is true.

And now we follow the chain of conclusions, rising to the top from the very end. :)

In compat analysis, we think that we should ignore a single instance field of the CancellationSource type, which is m_source. Thus, we decided that CancellationToken is an "empty structure", hence no slot is created for it, and no "empty structures" are written to the cache. Since there is no slot, we do not process the out parameter and do not record a compilation error when performing compat analysis. As a result, strict and compat analysis give different results, which is why the compilation error is downgraded to a low-priority warning.

That is, this is not some special processing of the CancellationToken type. There is a number of types for which the lack of out parameter's initialization will not lead to compilation errors.

Let's try to see in practice which types will be successfully compiled. As usual, we take our typical method:

void CheckYourself(out MyType obj)
{
  // Do nothing
}

And try to substitute different types instead of MyType. We've already figured out that this code compiles successfully for CancellationToken and for an empty structure. What else?

struct MyStruct
{ }

struct MyStruct2
{
  private MyStruct _field;
}

If we use MyStruct2 instead of MyType, the code also compiles successfully.

public struct MyExternalStruct
{
  private String _field;
}

When using this type, the code will compile successfully if MyExternalStruct is declared in an external assembly. If MyExternalStruct is declared in the same assembly with the CheckYourself method, it does not compile.

When using this type from an external assembly, the code no longer compiles, as we changed the access modifier of the _field field from private to public:

public struct MyExternalStruct
{
  public String _field;
}

With this kind of change, the code will not compile either, since we changed the field type from String to int:

public struct MyExternalStruct
{
  private int _field;
}

As you may have guessed, there is a certain scope for experimentation.

Let's recap

Generally speaking, out parameters must be initialized before the called method returns control to the caller. However, as practice shows, the compiler can make its own adjustments to this requirement. In some cases, a low-level warning will be issued instead of a compilation error. Why exactly this happens, we discussed in detail in the previous section.

But what about the types for which you can skip initializing out parameters? For example, parameter initialization is not required if the type is a structure with no fields. Or if all fields are structures with no fields. Here is the case with CancellationToken. This type is in the external library. Its only m_source field is of a reference type. The field itself is not available from external code. By these reasons the compilation is successful. Well, you can come up with other similar types - you'll be able to not initialize out parameters and successfully compile your code.

Going back to the question from the beginning of the article:

void CheckYourself(out MyStruct obj)
{
  // Do nothing
}
public struct MyStruct
{ .... }

Does this code compile? As you have already understood, neither 'Yes' nor 'No' is the correct answer. Depending on what MyStruct is, what fields are there, where the type is declared, etc. – this code can either compile or not compile.

Conclusion

What we went through today is diving into the compiler's source code to answer a seemingly simple question. I think we will repeat this experience soon, as the topic for the next similar article is already there. Stay in touch. ;)

By the way, subscribe to my Twitter account, where I also post articles and other interesting findings. This way you won't miss anything exciting. :)

#CSharp #Knowledge