Static Code Analysis

These days I've been updating the CA settings on some old projects, I wanted to modify the configuration of certain rules that were removed with the shipping of Visual Studio 2008. You can obtain detailed information about what rules are shipped with each version of CA in this post of the FxCop blog.

As you can see in the post, some of the rules were removed because they do not apply anymore or because they were too noisy compared with the benefit introduced. One of the rules that make feel more upset when I knew it was removed was "CA1818 - Do not concatenate strings inside loops". This rule threw an error (I will continue in my imaginary world where everybody sets warnings as errors in production code) with code like:

string s = string.Empty;
for (int i = 0; i < 5; i++)
{
   s += i.ToString();
}

One of the first things you learn in .NET is about the immutability of the strings, you can find lot of literature talking about how to handle properly strings. Even Improving .NET Application Performance and Scalability, one of the best papers I've seen about .NET performance, makes an explicit reference to do not concatenate strings when the number of concatenations is unknown.

So, today that I've been working again with the rule, I had the curiosity (hope) to verify the rule was not removed because it was too noisy but because the CLR was improved to avoid the issue. I know this is again living in my imaginary world, but I'm a bit stubborn...what I've done is to compile a console project with the code above with 2.0 (VS 2005) and 3.5 (VS 2008), first one fires the error, second one doesn't. First step has been to look for differences in the IL generated, both cases have:

   1:  .entrypoint
   2:  // Code size       39 (0x27)
   3:  .maxstack  2
   4:  .locals init ([0] string s,
   5:         [1] int32 i)
   6:  IL_0000:  ldsfld   string [mscorlib]System.String::Empty
   7:  IL_0005:  stloc.0
   8:  IL_0006:  ldc.i4.0
   9:  IL_0007:  stloc.1
  10:  IL_0008:  br.s     IL_001c
  11:  IL_000a:  ldloc.0
  12:  IL_000b:  ldloca.s i
  13:  IL_000d:  call     instance string 
  14:                     [mscorlib]System.Int32::ToString()
  15:  IL_0012:  call     string [mscorlib]System.String::Concat(string,
  16:                                                            string)
  17:  IL_0017:  stloc.0
  18:  IL_0018:  ldloc.1
  19:  IL_0019:  ldc.i4.1
  20:  IL_001a:  add
  21:  IL_001b:  stloc.1
  22:  IL_001c:  ldloc.1
  23:  IL_001d:  ldc.i4.5
  24:  IL_001e:  blt.s    IL_000a
  25:  IL_0020:  call     valuetype [mscorlib]System.ConsoleKeyInfo 
  26:                     [mscorlib]System.Console::ReadKey()
  27:  IL_0025:  pop
  28:  IL_0026:  ret

No improvements on the compiler side. Next step has been to verify the native code generated, for that I used the command !u at WinDbg (in the post Inline Methods you can see how to obtain the native code after the method is jitted). The code for both projects looks like (note that some memory addresses will be different):

   1:  CA1818.Program.Main(System.String[])
   2:  Begin 009d0070, size 55
   3:  009d0070 55           push ebp
   4:  009d0071 8bec         mov  ebp,esp
   5:  009d0073 57           push edi
   6:  009d0074 56           push esi
   7:  009d0075 83ec10       sub  esp,10h
   8:  009d0078 33c0         xor  eax,eax
   9:  009d007a 8945f4       mov  dword ptr [ebp-0Ch],eax
  10:  009d007d 8b3d2c10d602 mov  edi,dword ptr ds:[2D6102Ch] ("")
  11:  009d0083 33d2         xor  edx,edx
  12:  009d0085 8955f4       mov  dword ptr [ebp-0Ch],edx
  13:  009d0088 837df405     cmp  dword ptr [ebp-0Ch],5
  14:  009d008c 7d26         jge  009d00b4
  15:  009d008e 8b75f4       mov  esi,dword ptr [ebp-0Ch]
  16:  009d0091 e80a0cca6f   call mscorlib_ni+0x220ca0 (70670ca0) 
  17:  009d0096 50           push eax
  18:  009d0097 8bce         mov  ecx,esi
  19:  009d0099 33d2         xor  edx,edx
  20:  009d009b e8bad00971   call 
  21:                        mscorwks!LogHelp_TerminateOnAssert+0xb82 
  22:                        (71a6d15a) 
  23:  009d00a0 8bd0         mov  edx,eax
  24:  009d00a2 8bcf         mov  ecx,edi
  25:  009d00a4 e8a7ebc36f   call mscorlib_ni+0x1bec50 (7060ec50)
  26:  009d00a9 8bf8         mov  edi,eax
  27:  009d00ab ff45f4       inc  dword ptr [ebp-0Ch]
  28:  009d00ae 837df405     cmp  dword ptr [ebp-0Ch],5
  29:  009d00b2 7cda         jl   009d008e
  30:  009d00b4 8d4de8       lea  ecx,[ebp-18h]
  31:  009d00b7 33d2         xor  edx,edx
  32:  009d00b9 e8f6381570   call mscorlib_ni+0x6d39b4 (70b239b4)
  33:  009d00be 8d65f8       lea  esp,[ebp-8]
  34:  009d00c1 5e           pop  esi
  35:  009d00c2 5f           pop  edi
  36:  009d00c3 5d           pop  ebp
  37:  009d00c4 c3           ret

We see there are no improvements either on the jitter side. The next to verify is if the strings are being discarded on each iteration creating a new one or not. To do that I used some good profilers that are on the market but in the end I decided to show how I did it with WinDbg because anybody can download it for free.

With !DumpHeap -type System.String we can see all the string instances of our application. This returns a list with the memory address of the string instances, to view the contents of the object we just need to do a !do (dump object) of the address we want to check. So, just taking some samples from the instances with higher memory addresses we can already see the next:

   1:  0:003> !do 01c83ae0 
   2:  Name: System.String
   3:  MethodTable: 706c0a00
   4:  EEClass: 7047d64c
   5:  Size: 24(0x18) bytes
   6:  String: 012
   7:   
   8:  0:003> !do 01c83af8 
   9:  Name: System.String
  10:  MethodTable: 706c0a00
  11:  EEClass: 7047d64c
  12:  Size: 20(0x14) bytes
  13:  String: 3
  14:   
  15:  0:003> !do 01c83b0c
  16:  Name: System.String
  17:  MethodTable: 706c0a00
  18:  EEClass: 7047d64c
  19:  Size: 26(0x1a) bytes
  20:  String: 0123
  21:   
  22:  0:003> !do 01c83b28
  23:  Name: System.String
  24:  MethodTable: 706c0a00
  25:  EEClass: 7047d64c
  26:  Size: 20(0x14) bytes
  27:  String: 4
  28:   
  29:  0:003> !do 01c83b3c 
  30:  Name: System.String
  31:  MethodTable: 706c0a00
  32:  EEClass: 7047d64c
  33:  Size: 28(0x1c) bytes
  34:  String: 01234

From the results above we can see how on each iteration we have two strings, the resulting of i.ToString() and the resulting of concatenation.

Conclusion, we still creating new strings and discarding the previous version for GC on each manipulation of the string. Therefore, I suppose the rule was removed just because it was too noisy.

Once I stopped playing with WinDbg I come back to the earth to say what I wanted to say from the beginning. It's pity to see that a performance issue that has been repeated so many times, now is ignored just because the rule is too noisy. I know lot of people could argue that using the StringBuilder we also discard old versions of strings when the size of the string becomes bigger than the buffer, but still better than discarding all modifications.

I don't understand why a good string handling was that important before and now is just ignored by the main CA tool used by .NET developers.

Static Fields

"Types may declare locations that are associated with the type rather than any particular value of the type. Such locations are static fields of the type. As such, static fields declare a location that is shared by all values of the type. Just like non-static (instance) fields, a static field is typed and that type never changes. Static fields are always restricted to a single application domain basis, but they may also be allocated on a per-thread basis."

The paragraph above is the definition of static field extracted from the CLI specification. The text means that when we create a static field "f" inside a class "C", the value will not belong to any of the instances we create of C, instead its value is shared across all the instances and therefore "belongs" to the type C itself.

I've created a very simple class with one static and two instance fields, which you can see below.

   1: public class BusinessLogic
   2: {
   3:     public int instanceField1;
   4:     public int instanceField2;
   5:  
   6:     public static int staticField3 = 3;
   7:  
   8:     public BusinessLogic()
   9:     {
  10:         instanceField1 = 1;
  11:         instanceField2 = 2;
  12:     }
  13: }

I've run the code and I've created 5 instances of the class above, which we will check with WinDBG. To do it we can execute the next command:

!DumpHeap -type BusinessLogic

That will show something similar to the next:

Address               MT     Size
01dfa98c      006668c4      16    
01dfe898     006668c4       16    
01e027a4     006668c4       16    
01e0668c     006668c4       16    
01e13104     006668c4       16    
total 5 objects
Statistics:
      MT    Count    TotalSize   Class Name
006668c4        5           80        BusinessLogic
Total 5 objects

Once we have the address of the 5 instances we created we can dump the objects one by one by using the command !DumpObj [Address]. If we take the first one it will display something similar to:

Name: BusinessLogic
MethodTable: 001c68bc
EEClass: 00281abc
Size: 16(0x10) bytes
Fields:
          MT      Field   Offset   Type                VT   Attr         Value    Name
6f642b38  4000001        4      System.Int32     1    instance  1           instanceField1
6f642b38  4000002        8      System.Int32     1    instance  2           instanceField2
6f642b38  4000003       1c     System.Int32     1    static       3           staticField3

What we have done with this command is to examine the fields of one of the instances in memory of BusinessLogic. We can see that the object has the fields instanceField1, instanceField2 and staticField3 and its values are 1,2 and 3 respectively. So, all is as expected.

Lets focus now on the cool thing of static fields. We can see how "staticField3" is shared across al the instances we create of BusinessLogic by examing the EEClass of the objects. The 5 instances we created before have the same EEClass: "00281abc". If we examine it with the command !DumpClass 00281abc we will see that "staticField3" is present at EEClass level and its value is already initialized.

Class Name: BusinessLogic
mdToken: 02000002
Parent Class: 6f3d3ef0
Module: 001c64f0
Method Table: 001c68bc
Vtable Slots: 4
Total Method Slots: 6
Class Attributes: 100001 
NumInstanceFields: 2
NumStaticFields: 1
      MT          Field  Offset   Type                 VT    Attr         Value   Name
6f642b38    4000001        4      System.Int32   1     instance              instanceField1
6f642b38    4000002        8      System.Int32   1     instance              instanceField2
6f642b38  4000003      1c    System.Int32   1     static           3     staticField3

At this point we have demonstrated how the static fields and their values are shared across all the instances we create of a class.

In this sample we have used value types for the static field, but this also applies for reference types. This is very cool, because it allow us creating class designs where objects with a heavy load construction can be instantiated just once  i.e.

static object staticField;
....
if (staticField == null)
       staticField = new object();

This simple code will allow sharing the same instance of staticField across the entire application domain helping with the performance if staticField is hard to construct.

This does not mean that from now you must create all your "hard to construct" fields as static, because like always the gold hammer does not exist and static fields have a downside.

As we have seen the static field values are related to the EEClass, which are allocated on the loader heaps that are AppDomain specific, so this means they will be in memory until the AppDomain is unloaded.

Executing dynamic IL with DynamicMethod

One of the features we have in .NET is the ability to execute IL code generated dynamically, in fact there is a complete namespace to generate code at runtime called System.Reflection.Emit

Prior to .NET Framework 2.0 if we wanted to execute the IL code generated we needed to deal with a dynamic assembly making use of classes like the AssemblyBuilder, ModuleBuilder, etc. together with the IL code we can also emit symbolic information making possible to debug the dynamic code even within Visual Studio.

The main downside of this way of generating the IL code is (by design) that the generated code can not be garbage collected and therefore the memory is not released until the AppDomain is unload. This created some controversy about if this was a memory leak or not, which in any case is avoidable by loading everything in a separated AppDomain that can be unloaded when it's not used anymore.

With .NET Framework 2.0 we got the LCG (Lightweight Code Generation), which allows generating code at run time, without having to define the dynamic assembly nor type to contain the methods we create. In addition, the IL and the data structures related to the code generation are allocated on the managed, meaning that they can be garbage collected when there are no more references to the DynamicMethod class that is the main class of LCG.

Below you can see some simple code that dynamically emits the IL for a method "Talk" that concatenates two strings.

   1: class Demo
   2: {
   3:     public void RunDynamicCode()
   4:     {
   5:         MethodInfo concatMethodInfo = typeof(string).GetMethod("Concat", 
   6:             new Type[] { typeof(string), typeof(string) });
   7:  
   8:         DynamicMethod dm = new DynamicMethod("Talk", typeof(string), 
   9:             new Type[] { typeof(string) }, this.GetType(), true);
  10:  
  11:         ILGenerator generator = dm.GetILGenerator();
  12:         generator.Emit(OpCodes.Ldstr, "Hello ");
  13:         generator.Emit(OpCodes.Ldarg_0);
  14:         generator.EmitCall(OpCodes.Call, concatMethodInfo, null);
  15:         generator.Emit(OpCodes.Ret);
  16:  
  17:         Func<string,string> talkDelegate =
  18:             (Func<string, string>)dm.CreateDelegate(typeof(Func<string, string>));
  19:  
  20:         string result = talkDelegate("Jose Bonnin");
  21:  
  22:         Console.Write(result);
  23:         Console.ReadKey();
  24:     }
  25: }

Line 8 instantiates the DynamicMethod and specifies the name, the return type, an array with the parameters received and the type owner. The C# signature would be similar to "string Talk(string);"

Lines from 11 to 15 is where we write the IL code making use of the ILGenerator class. Note that in the line 16 we are emitting a call to the MethodInfo for the method String.Concat we obtained in the line 5.

Finally, in line 17 we do the coolest, we obtain a delegate for the method we have just generated that is callable from our C# code.

The main problem we experience with DynamicMethod is that we do not have the ability to generate debugging info for LCG, since the debugging API is based on metadata that the LCG does not have. In any case, not all is lost since we can continue debugging with WinDBG.

If we attach WinDBG to the code above we can effectively check that we have created a dynamic method and see the IL generated. To do it we just need to obtain a pointer to the DynamicMethod which can be obtained in different ways, an easy one for demo purposes is to the do a !DumpHeap -stat to obtain a list of the different objects grouped by type, then we just need to locate in the list the MT address of the type System.Reflection.Emit.DynamicMethod and run the command !DumpHeap -mt [address] over it. Which will show something similar to this:

0:003> !DumpHeap -mt 6fbf026c
Address      MT         Size
01d0e028   6fbf026c    56
total 1 objects
Statistics:
MT         Count   TotalSize    Class Name
6fbf041c        1            56     System.Reflection.Emit.DynamicMethod
Total 1 objects

Now that we have obtained the address (01d0e028) we can run the command DumpIL to see the dynamic IL generated.

0:003> !DumpIL 01d0e028
This is dynamic IL. Exception info is not reported at this time.
If a token is unresolved, run "!do " on the addr given
in parenthesis. You can also look at the token table yourself, by
running "!DumpArray 01d0e62c".

IL_0000: ldstr 70000002 "Hello "
IL_0005: ldarg.0
IL_0006: call a000003 (01d0e560)
IL_000b: ret

In this post we have seen how we do not need to look to "new" technologies like WPF, Silverlight or WCF to find cool things within .NET

Inline Methods

Method inlining is an optimization technique that consists of putting the body of a method inside the body of all the caller methods and removing the original method. Let's suppose the next sample:

   1: class Demo
   2: {
   3:     public void Talk()
   4:     {
   5:         SayHello();
   6:  
   7:         WaitUserAction();
   8:     }
   9:  
  10:     private void SayHello()
  11:     {
  12:         Console.WriteLine("Hello!");
  13:     }
  14:  
  15:     ...
  16: }

To inline the method SayHello, we should refactor our code in the next way:

   1: class Demo
   2: {
   3:     public void Talk()
   4:     {
   5:         Console.WriteLine("Hello!");
   6:  
   7:         WaitUserAction();
   8:     }
   9:     ...
  10: }

As you can appreciate we have introduced "Console.WriteLine("Hello!");" directly inside the method Talk.

This technique provides some performance benefits by reducing the overhead associated to a new method call. There are many characteristics that make a method a good candidate to be inlined like: reduced size and complexity, methods frequently used, etc. Fortunately this is not something we should think ourselves, most compilers include this optimization when compile our code.

.NET Framework is not an exception and although it does not have an explicit "inline" clause to suggest its use, as you can find in C++, the JIT (Just in Time) Compiler performs inline optimizations when it considers that can help to improve the performance.

If we debug the first version of the code we can appreciate how the method "Talk" and "SayHello" are inlined. The code must be compiled in Release mode and run out of Visual Studio in order the jitter applies the optimizations, so a possible way you can see it is debugging with WinDBG. To do it just run the code and attach the executable, load the Son of Strike extension and run the command !EEStack -EE this will display something similar to:

0015ee28 793e8e33 (MethodDesc 0x79259c00 +0x7  System.Console.ReadKey())
0015ee2c 009100ea (MethodDesc 0x3030a0   +0x32  Demo.WaitUserAction())
0015ee3c 009100a7 (MethodDesc 0x303028   +0x37  Program.Main(System.String[]))

As we see the stack does not display the methods "Talk" and "SayHello" and if we run the command !DumpMT -MD [address of method table] we will see how SOS does not reflect the methods as jitted even if we know the code has been executed. This is because, after inline those methods, its body has been expanded within the Main method but they do "not exist" at all.

79371278   7914b928   PreJIT System.Object.ToString()
7936b3b0   7914b930   PreJIT System.Object.Equals(System.Object)
7936b3d0   7914b948   PreJIT System.Object.GetHashCode()
793624d0   7914b950   PreJIT System.Object.Finalize()
0030c035   00303090   NONE Demo.Talk()
0030c039   00303098   NONE Demo.SayHello()

009100b8   003030a0   JIT Demo.WaitUserAction()
0030c041   003030a8   NONE Demo..ctor()

Just for fun we can change the code and force the jitter to do not inline our methods. This is accomplished adding the attribute [MethodImpl(MethodImplOptions.NoInlining)] on top of the method definition for "Talk" and "SayHello". If we recompile and debug with WinDBG again we will see the differences.

Now the stack shows the calls to the methods "Talk" and "SayHello" and both methods appear as jitted.

0021ed10 793e8e33 (MethodDesc 0x79259c00 +0x7 System.Console.ReadKey())
0021ed14 009a0132 (MethodDesc 0x8830a0 +0x32 Demo.WaitUserAction())
0021ed24 009a00ed (MethodDesc 0x883098 +0x2d Demo.SayHello())
0021ed28 009a00a6 (MethodDesc 0x883090 +0x6 Demo.Talk())

0021ed2c 009a0084 (MethodDesc 0x883028 +0x14 Program.Main(System.String[]))

79371278 7914b928 PreJIT System.Object.ToString()
7936b3b0 7914b930 PreJIT System.Object.Equals(System.Object)
7936b3d0 7914b948 PreJIT System.Object.GetHashCode()
793624d0 7914b950 PreJIT System.Object.Finalize()
009a00a0 00883090 JIT Demo.Talk()
009a00c0 00883098 JIT Demo.SayHello()
009a0100 008830a0 JIT Demo.WaitUserAction()
0088c041 008830a8 NONE Demo..ctor()

You can go one step further checking not only the stacks, but also how the code produced contains the inline instructions. If we use the command !u within WindDBG for each method "Main", "Talk" and "SayHello" when they are not inlined we obtain something similar to this:

Program.Main(System.String[])
Begin 009a0070, size 15
009a0070 b9b0308800 mov ecx,8830B0h (MT: Demo)
009a0075 e8a21f77ff call 0011201c (JitHelp: CORINFO_HELP_NEWSFAST)
009a007a 8bc8 mov ecx,eax
009a007c 3909 cmp dword ptr [ecx],ecx
009a007e ff15e8308800 call dword ptr ds:[8830E8h] (Demo.Talk(), mdToken: 06000003)
009a0084 c3 ret

Demo.Talk()
Begin 009a00a0, size 7
009a00a0 ff15ec308800 call dword ptr ds:[8830ECh] (Demo.SayHello(), mdToken: 06000004)
009a00a6 c3 ret

Demo.SayHello()
Begin 002500c0, size 2e
002500c0 833d8c10f20200 cmp dword ptr ds:[2F2108Ch],0
002500c7 750a jne 002500d3
002500c9 b901000000 mov ecx,1
002500ce e891581179 call mscorlib_ni+0x2a5964 (79365964) (System.Console.InitializeStdOutError(Boolean), mdToken: 06000770)
002500d3 8b0d8c10f202 mov ecx,dword ptr ds:[2F2108Ch] (Object: System.IO.TextWriter+SyncTextWriter)
002500d9 8b153c30f202 mov edx,dword ptr ds:[2F2303Ch] ("Hello!")
002500df 8b01 mov eax,dword ptr [ecx]
002500e1 ff90d8000000 call dword ptr [eax+0D8h]
002500e7 ff15f0301b00 call dword ptr ds:[1B30F0h] (Demo.WaitUserAction(), mdToken: 06000005)
002500ed c3 ret

When the methods are inlined we cannot run anymore the command !u for them, but if we do it for the Main method we can see without being an expert on native code how the code contains now the SayHello instructions.

Program.Main(System.String[])
Begin 00970070, size 38
00970070 b9b0303a00 mov ecx,3A30B0h (MT: Demo)
00970075 e8a21fa2ff call 0039201c (JitHelp: CORINFO_HELP_NEWSFAST)
0097007a 833d8c10c80200 cmp dword ptr ds:[2C8108Ch],0
00970081 750a jne 0097008d
00970083 b901000000 mov ecx,1
00970088 e8d7589f78 call mscorlib_ni+0x2a5964 (79365964) (System. Console.InitializeStdOutError(Boolean), mdToken: 06000770)
0097008d 8b0d8c10c802 mov ecx,dword ptr ds:[2C8108Ch] (Object: System. IO.TextWriter+SyncTextWriter)
00970093 8b153c30c802 mov edx,dword ptr ds:[2C8303Ch] ("Hello!")
00970099 8b01 mov eax,dword ptr [ecx]
0097009b ff90d8000000 call dword ptr [eax+0D8h]
009700a1 ff15f0303a00 call dword ptr ds:[3A30F0h] (Demo.WaitUserAction(), mdToken: 06000005)
009700a7 c3 ret

As you can see there are many things to explore and get fun with WinDBG.