in Development

Static Code Analysis

These days I’ve been updating the CA settings on some old projects, I wanted to modify the configuration of certain rules that were removed with the shipping of Visual Studio 2008. You can obtain detailed information about what rules are shipped with each version of CA in this post of the FxCop blog.

As you can see in the post, some of the rules were removed because they do not apply anymore or because they were too noisy compared with the benefit introduced. One of the rules that make feel more upset when I knew it was removed was “CA1818 – Do not concatenate strings inside loops”. This rule threw an error (I will continue in my imaginary world where everybody sets warnings as errors in production code) with code like:

string s = string.Empty;
for (int i = 0; i < 5; i++)
    s += i.ToString();

One of the first things you learn in .NET is about the immutability of the strings, you can find lot of literature talking about how to handle properly strings. Even Improving .NET Application Performance and Scalability, one of the best papers I’ve seen about .NET performance, makes an explicit reference to do not concatenate strings when the number of concatenations is unknown.

So, today that I’ve been working again with the rule, I had the curiosity (hope) to verify the rule was not removed because it was too noisy but because the CLR was improved to avoid the issue. I know this is again living in my imaginary world, but I’m a bit stubborn…what I’ve done is to compile a console project with the code above with 2.0 (VS 2005) and 3.5 (VS 2008), first one fires the error, second one doesn’t. First step has been to look for differences in the IL generated, both cases have:

// Code size       39 (0x27)
.maxstack  2
.locals init ([0] string s,
       [1] int32 i)
IL_0000:  ldsfld   string [mscorlib]System.String::Empty
IL_0005:  stloc.0
IL_0006:  ldc.i4.0
IL_0007:  stloc.1
IL_0008:  br.s     IL_001c
IL_000a:  ldloc.0
IL_000b:  ldloca.s i
IL_000d:  call     instance string 
IL_0012:  call     string [mscorlib]System.String::Concat(string,
IL_0017:  stloc.0
IL_0018:  ldloc.1
IL_0019:  ldc.i4.1
IL_001a:  add
IL_001b:  stloc.1
IL_001c:  ldloc.1
IL_001d:  ldc.i4.5
IL_001e:  blt.s    IL_000a
IL_0020:  call     valuetype [mscorlib]System.ConsoleKeyInfo 
IL_0025:  pop
IL_0026:  ret

No improvements on the compiler side. Next step has been to verify the native code generated, for that I used the command !u at WinDbg (in the post Inline Methods you can see how to obtain the native code after the method is jitted). The code for both projects looks like (note that some memory addresses will be different):

Begin 009d0070, size 55
009d0070 55           push ebp
009d0071 8bec         mov  ebp,esp
009d0073 57           push edi
009d0074 56           push esi
009d0075 83ec10       sub  esp,10h
009d0078 33c0         xor  eax,eax
009d007a 8945f4       mov  dword ptr [ebp-0Ch],eax
009d007d 8b3d2c10d602 mov  edi,dword ptr ds:[2D6102Ch] ("")
009d0083 33d2         xor  edx,edx
009d0085 8955f4       mov  dword ptr [ebp-0Ch],edx
009d0088 837df405     cmp  dword ptr [ebp-0Ch],5
009d008c 7d26         jge  009d00b4
009d008e 8b75f4       mov  esi,dword ptr [ebp-0Ch]
009d0091 e80a0cca6f   call mscorlib_ni+0x220ca0 (70670ca0) 
009d0096 50           push eax
009d0097 8bce         mov  ecx,esi
009d0099 33d2         xor  edx,edx
009d009b e8bad00971   call 
009d00a0 8bd0         mov  edx,eax
009d00a2 8bcf         mov  ecx,edi
009d00a4 e8a7ebc36f   call mscorlib_ni+0x1bec50 (7060ec50)
009d00a9 8bf8         mov  edi,eax
009d00ab ff45f4       inc  dword ptr [ebp-0Ch]
009d00ae 837df405     cmp  dword ptr [ebp-0Ch],5
009d00b2 7cda         jl   009d008e
009d00b4 8d4de8       lea  ecx,[ebp-18h]
009d00b7 33d2         xor  edx,edx
009d00b9 e8f6381570   call mscorlib_ni+0x6d39b4 (70b239b4)
009d00be 8d65f8       lea  esp,[ebp-8]
009d00c1 5e           pop  esi
009d00c2 5f           pop  edi
009d00c3 5d           pop  ebp
009d00c4 c3           ret

We see there are no improvements either on the jitter side. The next to verify is if the strings are being discarded on each iteration creating a new one or not. To do that I used some good profilers that are on the market but in the end I decided to show how I did it with WinDbg because anybody can download it for free.

With !DumpHeap -type System.String we can see all the string instances of our application. This returns a list with the memory address of the string instances, to view the contents of the object we just need to do a !do (dump object) of the address we want to check. So, just taking some samples from the instances with higher memory addresses we can already see the next:

0:003> !do 01c83ae0 
Name: System.String
MethodTable: 706c0a00
EEClass: 7047d64c
Size: 24(0x18) bytes
String: 012
0:003> !do 01c83af8 
Name: System.String
MethodTable: 706c0a00
EEClass: 7047d64c
Size: 20(0x14) bytes
String: 3
0:003> !do 01c83b0c
Name: System.String
MethodTable: 706c0a00
EEClass: 7047d64c
Size: 26(0x1a) bytes
String: 0123
0:003> !do 01c83b28
Name: System.String
MethodTable: 706c0a00
EEClass: 7047d64c
Size: 20(0x14) bytes
String: 4
0:003> !do 01c83b3c 
Name: System.String
MethodTable: 706c0a00
EEClass: 7047d64c
Size: 28(0x1c) bytes
String: 01234

From the results above we can see how on each iteration we have two strings, the resulting of i.ToString() and the resulting of concatenation.

Conclusion, we still creating new strings and discarding the previous version for GC on each manipulation of the string. Therefore, I suppose the rule was removed just because it was too noisy.

Once I stopped playing with WinDbg I come back to the earth to say what I wanted to say from the beginning. It’s pity to see that a performance issue that has been repeated so many times, now it is just ignored because the rule is too noisy. I know lot of people could argue that by using the StringBuilder, we also discard old versions of strings when the size of the string becomes bigger than the buffer, but it is still better than discarding all modifications.

I don’t understand why a good string handling was that important before and now is just ignored by the main CA tool used by .NET developers.