NDepend

People who knows me also knows that I'm a big fan of static analysis tools. Thanks to Patrick Smacchia I had the chance to test NDepend, probably the most complete tool for static code analysis. It is not worthy to enumerate all the features the tool offers, because the documentation and the web site already do a wonderful job with it. Just to mention some of them with NDepend you will be able to measure up to 82 predefined metrics to manage things like coupling, instability, abstractness, dependencies, naming conventions ...

Aside all the different, some of them unique, metrics the tool incorporates there are many features that make of NDepend a great tool like:

  • Possibility to incorporate new custom rules or modify the existing ones via the Code Query Language, a language with a syntax similar to any SQL language that allows you writing queries against your code structure. CQL makes so easy to create custom rules that there is no fair comparison with FxCop or StyleCop.
  • Possibility to introduce CQL constraints in the code that will allow you building really creative rules embedded in your code.
  • MSBuild and NAnt tasks to integrate with your TFS or CruiseControl.
  • Integration with Visual Studio and Reflector.
  • Different graphs that allow you seeing your code in a visual manner.

It is also remarkable the job Patrick has done by creating several demos on how to perform different operations with NDepend to take more profit of it. You can find them online, together with the description of the different metrics and links to technical articles to his blog. It is just pity that you cannot link easier to it from the tool itself. It would be fantastic if in next versions you could just right click in the failing rule and provide an option to reach the help.

Regardless I think the tool is great I must say that I didn't like some of the naming conventions rules do not follow the guidelines provided by MSDN and one of the greatest books about the subject: Framework Design Guidelines. I also found some of the metrics a little bit arbitrary, like code should have more than 20% of comments, I completely understand you need to put a value to break a rule, but I would love to see more detailed information on how he arrived to the conclusion that some rules should have a specific value instead of another one. Of course, this is part of his .NET knowledge and programming experience. Anyway, this shouldn't be a limitation because if for any reason you do not agree with a rule, just modify it, disable it ... you can do it if you have good reasons for it!!

I like to see the tool not only as a bunch of rules, but as the infrastructure that will allow to build and manage your corporate standards.

I want to finish the post saying that I like NDepend, I think this is one of the must-have developer tools, so if you haven't tried yet go the website and get an evaluation copy to check it.

Static Code Analysis

These days I've been updating the CA settings on some old projects, I wanted to modify the configuration of certain rules that were removed with the shipping of Visual Studio 2008. You can obtain detailed information about what rules are shipped with each version of CA in this post of the FxCop blog.

As you can see in the post, some of the rules were removed because they do not apply anymore or because they were too noisy compared with the benefit introduced. One of the rules that make feel more upset when I knew it was removed was "CA1818 - Do not concatenate strings inside loops". This rule threw an error (I will continue in my imaginary world where everybody sets warnings as errors in production code) with code like:

string s = string.Empty;
for (int i = 0; i < 5; i++)
{
   s += i.ToString();
}

One of the first things you learn in .NET is about the immutability of the strings, you can find lot of literature talking about how to handle properly strings. Even Improving .NET Application Performance and Scalability, one of the best papers I've seen about .NET performance, makes an explicit reference to do not concatenate strings when the number of concatenations is unknown.

So, today that I've been working again with the rule, I had the curiosity (hope) to verify the rule was not removed because it was too noisy but because the CLR was improved to avoid the issue. I know this is again living in my imaginary world, but I'm a bit stubborn...what I've done is to compile a console project with the code above with 2.0 (VS 2005) and 3.5 (VS 2008), first one fires the error, second one doesn't. First step has been to look for differences in the IL generated, both cases have:

   1:  .entrypoint
   2:  // Code size       39 (0x27)
   3:  .maxstack  2
   4:  .locals init ([0] string s,
   5:         [1] int32 i)
   6:  IL_0000:  ldsfld   string [mscorlib]System.String::Empty
   7:  IL_0005:  stloc.0
   8:  IL_0006:  ldc.i4.0
   9:  IL_0007:  stloc.1
  10:  IL_0008:  br.s     IL_001c
  11:  IL_000a:  ldloc.0
  12:  IL_000b:  ldloca.s i
  13:  IL_000d:  call     instance string 
  14:                     [mscorlib]System.Int32::ToString()
  15:  IL_0012:  call     string [mscorlib]System.String::Concat(string,
  16:                                                            string)
  17:  IL_0017:  stloc.0
  18:  IL_0018:  ldloc.1
  19:  IL_0019:  ldc.i4.1
  20:  IL_001a:  add
  21:  IL_001b:  stloc.1
  22:  IL_001c:  ldloc.1
  23:  IL_001d:  ldc.i4.5
  24:  IL_001e:  blt.s    IL_000a
  25:  IL_0020:  call     valuetype [mscorlib]System.ConsoleKeyInfo 
  26:                     [mscorlib]System.Console::ReadKey()
  27:  IL_0025:  pop
  28:  IL_0026:  ret

No improvements on the compiler side. Next step has been to verify the native code generated, for that I used the command !u at WinDbg (in the post Inline Methods you can see how to obtain the native code after the method is jitted). The code for both projects looks like (note that some memory addresses will be different):

   1:  CA1818.Program.Main(System.String[])
   2:  Begin 009d0070, size 55
   3:  009d0070 55           push ebp
   4:  009d0071 8bec         mov  ebp,esp
   5:  009d0073 57           push edi
   6:  009d0074 56           push esi
   7:  009d0075 83ec10       sub  esp,10h
   8:  009d0078 33c0         xor  eax,eax
   9:  009d007a 8945f4       mov  dword ptr [ebp-0Ch],eax
  10:  009d007d 8b3d2c10d602 mov  edi,dword ptr ds:[2D6102Ch] ("")
  11:  009d0083 33d2         xor  edx,edx
  12:  009d0085 8955f4       mov  dword ptr [ebp-0Ch],edx
  13:  009d0088 837df405     cmp  dword ptr [ebp-0Ch],5
  14:  009d008c 7d26         jge  009d00b4
  15:  009d008e 8b75f4       mov  esi,dword ptr [ebp-0Ch]
  16:  009d0091 e80a0cca6f   call mscorlib_ni+0x220ca0 (70670ca0) 
  17:  009d0096 50           push eax
  18:  009d0097 8bce         mov  ecx,esi
  19:  009d0099 33d2         xor  edx,edx
  20:  009d009b e8bad00971   call 
  21:                        mscorwks!LogHelp_TerminateOnAssert+0xb82 
  22:                        (71a6d15a) 
  23:  009d00a0 8bd0         mov  edx,eax
  24:  009d00a2 8bcf         mov  ecx,edi
  25:  009d00a4 e8a7ebc36f   call mscorlib_ni+0x1bec50 (7060ec50)
  26:  009d00a9 8bf8         mov  edi,eax
  27:  009d00ab ff45f4       inc  dword ptr [ebp-0Ch]
  28:  009d00ae 837df405     cmp  dword ptr [ebp-0Ch],5
  29:  009d00b2 7cda         jl   009d008e
  30:  009d00b4 8d4de8       lea  ecx,[ebp-18h]
  31:  009d00b7 33d2         xor  edx,edx
  32:  009d00b9 e8f6381570   call mscorlib_ni+0x6d39b4 (70b239b4)
  33:  009d00be 8d65f8       lea  esp,[ebp-8]
  34:  009d00c1 5e           pop  esi
  35:  009d00c2 5f           pop  edi
  36:  009d00c3 5d           pop  ebp
  37:  009d00c4 c3           ret

We see there are no improvements either on the jitter side. The next to verify is if the strings are being discarded on each iteration creating a new one or not. To do that I used some good profilers that are on the market but in the end I decided to show how I did it with WinDbg because anybody can download it for free.

With !DumpHeap -type System.String we can see all the string instances of our application. This returns a list with the memory address of the string instances, to view the contents of the object we just need to do a !do (dump object) of the address we want to check. So, just taking some samples from the instances with higher memory addresses we can already see the next:

   1:  0:003> !do 01c83ae0 
   2:  Name: System.String
   3:  MethodTable: 706c0a00
   4:  EEClass: 7047d64c
   5:  Size: 24(0x18) bytes
   6:  String: 012
   7:   
   8:  0:003> !do 01c83af8 
   9:  Name: System.String
  10:  MethodTable: 706c0a00
  11:  EEClass: 7047d64c
  12:  Size: 20(0x14) bytes
  13:  String: 3
  14:   
  15:  0:003> !do 01c83b0c
  16:  Name: System.String
  17:  MethodTable: 706c0a00
  18:  EEClass: 7047d64c
  19:  Size: 26(0x1a) bytes
  20:  String: 0123
  21:   
  22:  0:003> !do 01c83b28
  23:  Name: System.String
  24:  MethodTable: 706c0a00
  25:  EEClass: 7047d64c
  26:  Size: 20(0x14) bytes
  27:  String: 4
  28:   
  29:  0:003> !do 01c83b3c 
  30:  Name: System.String
  31:  MethodTable: 706c0a00
  32:  EEClass: 7047d64c
  33:  Size: 28(0x1c) bytes
  34:  String: 01234

From the results above we can see how on each iteration we have two strings, the resulting of i.ToString() and the resulting of concatenation.

Conclusion, we still creating new strings and discarding the previous version for GC on each manipulation of the string. Therefore, I suppose the rule was removed just because it was too noisy.

Once I stopped playing with WinDbg I come back to the earth to say what I wanted to say from the beginning. It's pity to see that a performance issue that has been repeated so many times, now is ignored just because the rule is too noisy. I know lot of people could argue that using the StringBuilder we also discard old versions of strings when the size of the string becomes bigger than the buffer, but still better than discarding all modifications.

I don't understand why a good string handling was that important before and now is just ignored by the main CA tool used by .NET developers.

Xml Documentation Comments - Exceptions II

People who has been working with me knows that I'm a big fan of FxCop (or Code Analysis), I think static code analysis tools are very useful and a must-have in any development team, since ensure developers follow the company's guidelines and  write correct code (at least it helps a lot). In addition, I see them also as a great ally to improve the learning curve of new developers.

While FxCop analyzes managed code assemblies checking for improvements  about design, localization, performance, security, interoperability, etc.  StyleCop focuses mainly in code style guidelines.

In the previous post Xml Documentation Comments - Exceptions I we saw how to document exceptions using the Xml Documentation features. Here we will see how to take profit of StyleCop SDK  to create a custom rule that validates we are documenting those exceptions.

To create our rule we just need to create a class library project, add references to the assemblies Microsoft.StyleCop.dll and Microsoft.StyleCop.CSharp.dll, create a class that inherits from SourceAnalyzer, add an attribute and embed an Xml file with the necessary Metatada for our rules. All these steps are entirely documented in the StyleCop SDK documentation, so I do not waste too much time explaining it.

Before we start we need to know that the StyleCop engine incorporates a custom C# language service that is used to create a rich model representing the original C# code. What our rule will do is to use that model to visit the different elements that can throw exceptions, analyze the element's body to find the throw statements and extract the type of that exceptions, it will extract the exceptions documented using the Xml comments, finally it will compare the thrown exceptions with the documented ones and will add a violation for all the discrepancies found.

We know that we can use throw statements inside methods (including constructors), indexers and properties. So, taking profit of the Visitor implementation that the API provide us we will visit all the CsElement of a CsDocument looking for the declared methods, indexers and properties. Below you can see the callback method we have passed to the WalkDocument method.

   1:  private bool VisitElements(
   2:      CsElement element, 
   3:      CsElement parentElement, 
   4:      List<string> usingDirectives)
   5:  {
   6:      if (this.Cancel)
   7:      {
   8:          return false;
   9:      }
  10:   
  11:      // We store the using directives for future use
  12:      if (element.ElementType == ElementType.UsingDirective)
  13:      {
  14:          usingDirectives.Add(
  15:              ((UsingDirective)element).NamespaceType);
  16:      }
  17:   
  18:      // There are severeal element types that can throw exceptions: 
  19:      // Methods, Constructors, Indexers and Properties
  20:      if (!element.Generated &&
  21:          (element.ElementType == ElementType.Method || 
  22:           element.ElementType == ElementType.Constructor ||
  23:           element.ElementType == ElementType.Indexer || 
  24:           element.ElementType == ElementType.Property))
  25:      {
  26:          this.CheckExceptionDocumentation(element, 
  27:              usingDirectives);
  28:      }
  29:   
  30:      return true;
  31:  }

Lines 17 to 24 allow us filtering the elements for which we are going to check the documentation. You can see also from line 10 to 13 that we store the using directives defined in the document, later on I will explain why.

The next is to analyze more in detail each element we have filtered, what we will do is first look for the exception tags added  in the Xml comments and then compare them with the type of the exceptions thrown in the given element. The first step is as easy as perform a Linq Xml query over a property called Header the CsElement has. For the second one we will create a CodeWalkerStatementVisitor that we will pass to the WalkElement method, this callback will be used to iterate the element's statements looking for all the the throw statements defined.

To do it we only need to check the StatementType property belogning to the class Statement and make sure its type is StatementType.Throw. Once we have the ThrowStatement we can investigate the type of the exception that will be thrown. The demo rule can analyze the next scenarios:

  1. New expressions. i.e. throw new System.Exception();
  2. Literal expressions. i.e. var ex = new Exception(); throw ex;
  3. Method invocation expressions. i.e. throw GetMyExceptionInstance();

The ThrowStatement class has a property called ThrownExpression that allow us identifying the expression used to throw the exception.

For the first scenario, we need to iterate the different tokens of the NewExpression and extract the exception type. This is done in the code using the next method.

   1:  private static string ExtractExceptionType(NewExpression expression)
   2:  {
   3:      for (Node<CsToken> node = expression.Tokens.First; node != null; 
   4:          node = node.Next)
   5:      {
   6:          CsToken token = node.Value;
   7:          if (token.CsTokenType == CsTokenType.Other)
   8:          {
   9:              return token.Text;
  10:          }
  11:      }
  12:   
  13:      return string.Empty;
  14:  }

For the second scenario we have a small handicap, the literal represents a variable that is not defined in the throw statement, instead it is defined in upper statements or even in elements different than the one where the exception is thrown i.e.

   1:  ArgumentNullException ex = new ArgumentNullException();
   2:  public void DoSomething(string value)
   3:  {
   4:      if (value == null)
   5:      {
   6:          throw new ex;
   7:      }
   8:  }

In the code above the element containing the throw statement is the method DoSomething, but the variable ex is defined in a class element.

For this reason before we can obtain the type of the exception, we first need to retrieve the variables defined by the parent statements of the throw statement. After that we just need to find the variable among the retrieved ones to determine its type and return it for further analysis.

The third scenario is the MethodInvocationExpression, here we have the biggest handicaps because there are many possibilities in which we can obtain an exception from a method. The easiest case is a direct method call, we will have MethodInvocationExpression containing the LiteralExpression with the name of the method, but imagine the next situation:

throw new MyClass().GetException();

In this case, the expression is a member access and we need to go to the right side to find the literal with the name of the method.

Once we have the obtained the name of the method, we can perform a search in the Document looking for that Method element in order to retrieve its return type that, of course, will be our exception type. We do not need to worry about overloads with different return types, because even if the CLR supports it C# does not and we are analyzing C# code.

At this point we have been able to retrieve all the exception types thrown by the throw statements and we can compare with the ones we have documented. Unfortunately we are not done yet and there is another point to solve.

The problem we have is that the object model representing the code we analyze, does not stores the type of the variables , return types of methods, etc. as the .NET type "Type". Instead they are represented in best case as a TokenType with a text representation. This has sense for many reasons, one is because StyleCop analyzes code that doesn't necessarily compiles therefore it's possible the type does not exists yet.

So, how this affects to the rule? Since we do not have the real types for the Exceptions we want to verify but string representations of them, the next code will cause a false positive:

   1:  /// <exception cref="System.Exception"></exception>
   2:  public void DoSomething()
   3:  {
   4:      throw new Exception();
   5:  }

The false positive appears because at the time we compare the type throw with the type documented we will compare "System.Exception" with "Exception". That's the reason why we have stored the using directives in a previous step, because before to compare we will try to obtain the full type name doing a Type.GetType() concatenating the different namespaces imported into the document. In case, that we are still not able to get the type (i.e. exception defined in a different assembly not accessible from StyleCop) we will compare both texts without the namespace, this is using Exception instead of System.Exception.

I'm really curious to see if StyleCop will be benefited when the new managed C# compiler is there, part of the project to offer the C# compiler as a service will include a complete language object model. I suppose it will be a natural step to use that new model, but this is way far from today since, regardless the C# team is already working on it, it is supposed to come after C# 4.0.

With this rule we are covering the most common scenarios where exceptions are thrown, but there are cases that will fail. i.e. If the exception thrown is obtained from a method declared in a different document, the rule will not be able to find that method and consequently the type of the exception.

StyleCopRule.zip (21.66 kb)

Xml Documentation Comments - Exceptions I

Long time ago my team was investigating about a new beta technology called .NET, we needed to compare it against Java and decide what would be our new development platform (I guess it's clear which one we successfully decided to use). One of the things I liked about Java were Checked Exceptions, I thought it was handy to  have this enforced by the compiler. Nowadays I'm 100% convinced I wouldn't like to have them in .NET and it seems I'm not the only one. If you are curious to know why checked exceptions where not added to .NET you can read the interview done to Anders Hejlsberg some years after the beta release.

In any case, it's clear that even if we do not want the compiler forces us to handle exceptions, there is no doubt that the information about the exceptions that can be thrown by a method are very useful for developers. In order to help us with this we can make use of the Xml Documentation Comments, to do it we have the exception tag that allow us introducing the information regarding the exceptions  thrown by our Constructors, Methods, Properties and Indexers, this information will then be listed not only in the documentation we generate with tools like Sandcastle or NDoc  but also when we use Intellisense.

Below you can see an example about how to make use of it.

   1:  /// <summary>
   2:  /// Does something
   3:  /// </summary>
   4:  /// <param name="value">A string value</param>
   5:  /// <exception cref="System.ArgumentNullException">
   6:  /// Thrown when value is null
   7:  /// </exception>
   8:  public void DoSomething(string value)
   9:  {
  10:      if (value == null)
  11:      {
  12:          throw new System.ArgumentNullException("value");
  13:      }
  14:  }

The compiler gives us some help to make sure we write the Xml comments, but unfortunately it is not enabled by default. To do it we need to go to the build properties of our project and configure it to generate the documentation files.

xml documentation file

After we configure it the compiler will generate errors (warnings if you do not work with "Treat warnings as errors", but I like to think nobody has this option disabled in professional software) for the public or visible elements that do not have the Xml comments.  This is a good start point, the compiler validates the Xml comments exist, but does not check the validity of them. i.e. somebody can add the tags and let the values empty. Therefore if we want to check our documentation is properly formatted we need to look for static analysis tools that do this work.

In the next part of this post we will see how to create a custom rule for StyleCop that checks the exceptions thrown by our code are documented.