Mad about .NET A blog from Jose Fco Bonnin


These days I've been updating the CA settings on some old projects, I wanted to modify the configuration of certain rules that were removed with the shipping of Visual Studio 2008. You can obtain detailed information about what rules are shipped with each version of CA in this post of the FxCop blog.

As you can see in the post, some of the rules were removed because they do not apply anymore or because they were too noisy compared with the benefit introduced. One of the rules that make feel more upset when I knew it was removed was "CA1818 - Do not concatenate strings inside loops". This rule threw an error (I will continue in my imaginary world where everybody sets warnings as errors in production code) with code like:

   1:  string s = string.Empty;
   2:  for (int i = 0; i < 5; i++)
   3:  {
   4:      s += i.ToString();
   5:  }

One of the first things you learn in .NET is about the immutability of the strings, you can find lot of literature talking about how to handle properly strings. Even Improving .NET Application Performance and Scalability, one of the best papers I've seen about .NET performance, makes an explicit reference to do not concatenate strings when the number of concatenations is unknown.

So, today that I've been working again with the rule, I had the curiosity (hope) to verify the rule was not removed because it was too noisy but because the CLR was improved to avoid the issue. I know this is again living in my imaginary world, but I'm a bit stubborn...what I've done is to compile a console project with the code above with 2.0 (VS 2005) and 3.5 (VS 2008), first one fires the error, second one doesn't. First step has been to look for differences in the IL generated, both cases have:

   1:  .entrypoint
   2:  // Code size       39 (0x27)
   3:  .maxstack  2
   4:  .locals init ([0] string s,
   5:         [1] int32 i)
   6:  IL_0000:  ldsfld   string [mscorlib]System.String::Empty
   7:  IL_0005:  stloc.0
   8:  IL_0006:  ldc.i4.0
   9:  IL_0007:  stloc.1
  10:  IL_0008:  br.s     IL_001c
  11:  IL_000a:  ldloc.0
  12:  IL_000b:  ldloca.s i
  13:  IL_000d:  call     instance string 
  14:                     [mscorlib]System.Int32::ToString()
  15:  IL_0012:  call     string [mscorlib]System.String::Concat(string,
  16:                                                            string)
  17:  IL_0017:  stloc.0
  18:  IL_0018:  ldloc.1
  19:  IL_0019:  ldc.i4.1
  20:  IL_001a:  add
  21:  IL_001b:  stloc.1
  22:  IL_001c:  ldloc.1
  23:  IL_001d:  ldc.i4.5
  24:  IL_001e:  blt.s    IL_000a
  25:  IL_0020:  call     valuetype [mscorlib]System.ConsoleKeyInfo 
  26:                     [mscorlib]System.Console::ReadKey()
  27:  IL_0025:  pop
  28:  IL_0026:  ret

No improvements on the compiler side. Next step has been to verify the native code generated, for that I used the command !u at WinDbg (in this previous post you can see how to obtain the native code after the method is jitted). The code for both projects looks like (note that some memory addresses will be different):

   1:  CA1818.Program.Main(System.String[])
   2:  Begin 009d0070, size 55
   3:  009d0070 55           push ebp
   4:  009d0071 8bec         mov  ebp,esp
   5:  009d0073 57           push edi
   6:  009d0074 56           push esi
   7:  009d0075 83ec10       sub  esp,10h
   8:  009d0078 33c0         xor  eax,eax
   9:  009d007a 8945f4       mov  dword ptr [ebp-0Ch],eax
  10:  009d007d 8b3d2c10d602 mov  edi,dword ptr ds:[2D6102Ch] ("")
  11:  009d0083 33d2         xor  edx,edx
  12:  009d0085 8955f4       mov  dword ptr [ebp-0Ch],edx
  13:  009d0088 837df405     cmp  dword ptr [ebp-0Ch],5
  14:  009d008c 7d26         jge  009d00b4
  15:  009d008e 8b75f4       mov  esi,dword ptr [ebp-0Ch]
  16:  009d0091 e80a0cca6f   call mscorlib_ni+0x220ca0 (70670ca0) 
  17:  009d0096 50           push eax
  18:  009d0097 8bce         mov  ecx,esi
  19:  009d0099 33d2         xor  edx,edx
  20:  009d009b e8bad00971   call 
  21:                        mscorwks!LogHelp_TerminateOnAssert+0xb82 
  22:                        (71a6d15a) 
  23:  009d00a0 8bd0         mov  edx,eax
  24:  009d00a2 8bcf         mov  ecx,edi
  25:  009d00a4 e8a7ebc36f   call mscorlib_ni+0x1bec50 (7060ec50)
  26:  009d00a9 8bf8         mov  edi,eax
  27:  009d00ab ff45f4       inc  dword ptr [ebp-0Ch]
  28:  009d00ae 837df405     cmp  dword ptr [ebp-0Ch],5
  29:  009d00b2 7cda         jl   009d008e
  30:  009d00b4 8d4de8       lea  ecx,[ebp-18h]
  31:  009d00b7 33d2         xor  edx,edx
  32:  009d00b9 e8f6381570   call mscorlib_ni+0x6d39b4 (70b239b4)
  33:  009d00be 8d65f8       lea  esp,[ebp-8]
  34:  009d00c1 5e           pop  esi
  35:  009d00c2 5f           pop  edi
  36:  009d00c3 5d           pop  ebp
  37:  009d00c4 c3           ret

We see there are no improvements either on the jitter side. The next to verify is if the strings are being discarded on each iteration creating a new one or not. To do that I used some good profilers that are on the market but in the end I decided to show how I did it with WinDbg because anybody can download it for free.

With !DumpHeap -type System.String we can see all the string instances of our application. This returns a list with the memory address of the string instances, to view the contents of the object we just need to do a !do (dump object) of the address we want to check. So, just taking some samples from the instances with higher memory addresses we can already see the next:

   1:  0:003> !do 01c83ae0 
   2:  Name: System.String
   3:  MethodTable: 706c0a00
   4:  EEClass: 7047d64c
   5:  Size: 24(0x18) bytes
   6:  String: 012
   7:   
   8:  0:003> !do 01c83af8 
   9:  Name: System.String
  10:  MethodTable: 706c0a00
  11:  EEClass: 7047d64c
  12:  Size: 20(0x14) bytes
  13:  String: 3
  14:   
  15:  0:003> !do 01c83b0c
  16:  Name: System.String
  17:  MethodTable: 706c0a00
  18:  EEClass: 7047d64c
  19:  Size: 26(0x1a) bytes
  20:  String: 0123
  21:   
  22:  0:003> !do 01c83b28
  23:  Name: System.String
  24:  MethodTable: 706c0a00
  25:  EEClass: 7047d64c
  26:  Size: 20(0x14) bytes
  27:  String: 4
  28:   
  29:  0:003> !do 01c83b3c 
  30:  Name: System.String
  31:  MethodTable: 706c0a00
  32:  EEClass: 7047d64c
  33:  Size: 28(0x1c) bytes
  34:  String: 01234

From the results above we can see how on each iteration we have two strings, the resulting of i.ToString() and the resulting of concatenation.

Conclusion, we still creating new strings and discarding the previous version for GC on each manipulation of the string. Therefore, I suppose the rule was removed just because it was too noisy.

Once I stopped playing with WinDbg I come back to the earth to say what I wanted to say from the beginning. It's pity to see that a performance issue that has been repeated so many times, now is ignored just because the rule is too noisy. I know lot of people could argue that using the StringBuilder we also discard old versions of strings when the size of the string becomes bigger than the buffer, but still better than discarding all modifications.

I don't understand why a good string handling was that important before and now is just ignored by the main CA tool used by .NET developers.




Comments

March 30 2009

Juan M. Servera

Hi Jose, see my (spanish) comments about your great entry in my blog: iremote.blogspot.com/.../...ones-detras-de-un.html


Juan M. Servera

Add comment


(Will show your Gravatar icon)

  Country flag

Captcha Image biuquote
  • Comment
  • Preview
Loading