These days I've been updating the CA settings on some old projects, I wanted to modify the configuration of certain rules that were removed with the shipping of Visual Studio 2008. You can obtain detailed information about what rules are shipped with each version of CA in this post of the FxCop blog.
As you can see in the post, some of the rules were removed because they do not apply anymore or because they were too noisy compared with the benefit introduced. One of the rules that make feel more upset when I knew it was removed was "CA1818 - Do not concatenate strings inside loops". This rule threw an error (I will continue in my imaginary world where everybody sets warnings as errors in production code) with code like:
1: string s = string.Empty;
2: for (int i = 0; i < 5; i++)
3: { 4: s += i.ToString();
5: }
One of the first things you learn in .NET is about the immutability of the strings, you can find lot of literature talking about how to handle properly strings. Even Improving .NET Application Performance and Scalability, one of the best papers I've seen about .NET performance, makes an explicit reference to do not concatenate strings when the number of concatenations is unknown.
So, today that I've been working again with the rule, I had the curiosity (hope) to verify the rule was not removed because it was too noisy but because the CLR was improved to avoid the issue. I know this is again living in my imaginary world, but I'm a bit stubborn...what I've done is to compile a console project with the code above with 2.0 (VS 2005) and 3.5 (VS 2008), first one fires the error, second one doesn't. First step has been to look for differences in the IL generated, both cases have:
1: .entrypoint
2: // Code size 39 (0x27)
3: .maxstack 2
4: .locals init ([0] string s,
5: [1] int32 i)
6: IL_0000: ldsfld string [mscorlib]System.String::Empty
7: IL_0005: stloc.0
8: IL_0006: ldc.i4.0
9: IL_0007: stloc.1
10: IL_0008: br.s IL_001c
11: IL_000a: ldloc.0
12: IL_000b: ldloca.s i
13: IL_000d: call instance string
14: [mscorlib]System.Int32::ToString()
15: IL_0012: call string [mscorlib]System.String::Concat(string,
16: string)
17: IL_0017: stloc.0
18: IL_0018: ldloc.1
19: IL_0019: ldc.i4.1
20: IL_001a: add
21: IL_001b: stloc.1
22: IL_001c: ldloc.1
23: IL_001d: ldc.i4.5
24: IL_001e: blt.s IL_000a
25: IL_0020: call valuetype [mscorlib]System.ConsoleKeyInfo
26: [mscorlib]System.Console::ReadKey()
27: IL_0025: pop
28: IL_0026: ret
No improvements on the compiler side. Next step has been to verify the native code generated, for that I used the command !u at WinDbg (in this previous post you can see how to obtain the native code after the method is jitted). The code for both projects looks like (note that some memory addresses will be different):
1: CA1818.Program.Main(System.String[])
2: Begin 009d0070, size 55
3: 009d0070 55 push ebp
4: 009d0071 8bec mov ebp,esp
5: 009d0073 57 push edi
6: 009d0074 56 push esi
7: 009d0075 83ec10 sub esp,10h
8: 009d0078 33c0 xor eax,eax
9: 009d007a 8945f4 mov dword ptr [ebp-0Ch],eax
10: 009d007d 8b3d2c10d602 mov edi,dword ptr ds:[2D6102Ch] ("") 11: 009d0083 33d2 xor edx,edx
12: 009d0085 8955f4 mov dword ptr [ebp-0Ch],edx
13: 009d0088 837df405 cmp dword ptr [ebp-0Ch],5
14: 009d008c 7d26 jge 009d00b4
15: 009d008e 8b75f4 mov esi,dword ptr [ebp-0Ch]
16: 009d0091 e80a0cca6f call mscorlib_ni+0x220ca0 (70670ca0)
17: 009d0096 50 push eax
18: 009d0097 8bce mov ecx,esi
19: 009d0099 33d2 xor edx,edx
20: 009d009b e8bad00971 call
21: mscorwks!LogHelp_TerminateOnAssert+0xb82
22: (71a6d15a)
23: 009d00a0 8bd0 mov edx,eax
24: 009d00a2 8bcf mov ecx,edi
25: 009d00a4 e8a7ebc36f call mscorlib_ni+0x1bec50 (7060ec50)
26: 009d00a9 8bf8 mov edi,eax
27: 009d00ab ff45f4 inc dword ptr [ebp-0Ch]
28: 009d00ae 837df405 cmp dword ptr [ebp-0Ch],5
29: 009d00b2 7cda jl 009d008e
30: 009d00b4 8d4de8 lea ecx,[ebp-18h]
31: 009d00b7 33d2 xor edx,edx
32: 009d00b9 e8f6381570 call mscorlib_ni+0x6d39b4 (70b239b4)
33: 009d00be 8d65f8 lea esp,[ebp-8]
34: 009d00c1 5e pop esi
35: 009d00c2 5f pop edi
36: 009d00c3 5d pop ebp
37: 009d00c4 c3 ret
We see there are no improvements either on the jitter side. The next to verify is if the strings are being discarded on each iteration creating a new one or not. To do that I used some good profilers that are on the market but in the end I decided to show how I did it with WinDbg because anybody can download it for free.
With !DumpHeap -type System.String we can see all the string instances of our application. This returns a list with the memory address of the string instances, to view the contents of the object we just need to do a !do (dump object) of the address we want to check. So, just taking some samples from the instances with higher memory addresses we can already see the next:
1: 0:003> !do 01c83ae0
2: Name: System.String
3: MethodTable: 706c0a00
4: EEClass: 7047d64c
5: Size: 24(0x18) bytes
6: String: 012
7:
8: 0:003> !do 01c83af8
9: Name: System.String
10: MethodTable: 706c0a00
11: EEClass: 7047d64c
12: Size: 20(0x14) bytes
13: String: 3
14:
15: 0:003> !do 01c83b0c
16: Name: System.String
17: MethodTable: 706c0a00
18: EEClass: 7047d64c
19: Size: 26(0x1a) bytes
20: String: 0123
21:
22: 0:003> !do 01c83b28
23: Name: System.String
24: MethodTable: 706c0a00
25: EEClass: 7047d64c
26: Size: 20(0x14) bytes
27: String: 4
28:
29: 0:003> !do 01c83b3c
30: Name: System.String
31: MethodTable: 706c0a00
32: EEClass: 7047d64c
33: Size: 28(0x1c) bytes
34: String: 01234
From the results above we can see how on each iteration we have two strings, the resulting of i.ToString() and the resulting of concatenation.
Conclusion, we still creating new strings and discarding the previous version for GC on each manipulation of the string. Therefore, I suppose the rule was removed just because it was too noisy.
Once I stopped playing with WinDbg I come back to the earth to say what I wanted to say from the beginning. It's pity to see that a performance issue that has been repeated so many times, now is ignored just because the rule is too noisy. I know lot of people could argue that using the StringBuilder we also discard old versions of strings when the size of the string becomes bigger than the buffer, but still better than discarding all modifications.
I don't understand why a good string handling was that important before and now is just ignored by the main CA tool used by .NET developers.