Groups | Blog | Home
all groups > dotnet clr > may 2006 >

dotnet clr : Fast tailcall


Ole Nielsby
5/11/2006 12:48:39 AM
Are there any things one can do to assist the jit in generating
efficient tail call optimization?

If a method with many parameters ends calling a similar
method with a similar but not identical signature, and some
of the parameters are passed unchanged, will the jit then
discover it can simply leave the parameters on the stack?

If two signatures are of differnt length but have some
common parameters that are, in some cases, passed
unchanged from one call to the other, would it be better
to place these parameters in the beginning or in the end
of the parameter list?

Will tail calls be faster if the signatures are padded with
dummies to have the same length?

Thanks/Ole N.

Tasos Vogiatzoglou
5/16/2006 6:10:54 AM
I remember seeing a tail jump command in CLR but I do not think that
the compiler supports that.

I've tried some experiments that would for sure lead to tail call
optimization but the disassembly didn't indicate a tail jump.

Is tail jump supported by the C# compiler ?

Regards
Tasos
Tasos Vogiatzoglou
5/16/2006 11:39:02 AM
I don't think this is the reason (after all tailcall opcode exists).
Sure it poses some restrictions but the CLR documentation states that
only full trusted code can perform a tailcall.
Barry Kelly
5/16/2006 2:45:17 PM
[quoted text, click to view]

I'm fairly sure it isn't. One of the biggest reasons (AFAIK) is because
it removes a frame from the stack for code access security checks.

Jeremy Chapman
5/16/2006 3:44:55 PM
Isn't there a way to ensure the code only runs fully trusted (using
attributes)? If so, then you could know at compile time whether or not you
can implement this optimization.

[quoted text, click to view]

Barry Kelly
5/16/2006 8:27:51 PM
[quoted text, click to view]

This is a brilliant reason why the optimization can't be implemented
generally, isn't it?

Since the C# compiler would have to choose at *compile* time (not *JIT
compile* time) to use tail.call, how could it know if the code is going
to be fully trusted or not?

It makes perfect sense as the reason to me.

Ole Nielsby
5/16/2006 9:59:20 PM

[quoted text, click to view]

Which seems consistent with Barry K.'s explanation: If security
checks require the frame to be there, untrusted code can't be
allowed to tail jump.

I'm still puzzled by the clr design (was this restriction really
necessary or is it there because of a half-hearted tail call
implementation?) but the C# compiler team's decision not
to support tail calls does make sense to me, at last.

Regards/Ole Nielsby

Tasos Vogiatzoglou
5/16/2006 11:14:20 PM
I assume that if the tail. command is supported by the jit (I do not
think that is supported) it will be used in rather rare conditions of
fully trusted code and perhaps code that does not access the execution
stack (via stacktrace or sth) .

But consider the following :

method private hidebysig instance int32 factorial(int32 x,int32 a) cil
managed
{
// Code size 19 (0x13)
.maxstack 4

IL_0000: ldarg.1
IL_0001: ldc.i4.1

IL_0002: bge.s IL_0006
IL_0004: ldarg.2

IL_0005: ret
IL_0006: ldarg.0

IL_0007: ldarg.1
IL_0008: ldc.i4.1

IL_0009: sub
IL_000a: ldarg.1
IL_000b: ldarg.2
IL_000c: mul

IL_000d: call instance int32
Class1::factorial(int32,int32)
IL_0012: ret
} // end of method Class1::factorial

This is an optimized version of a factorial method by the C# compiler.

Consider also this


method private hidebysig instance int32 factorial(int32 x,int32 a) cil
managed
{
// Code size 19 (0x13)
.maxstack 4

IL_0000: ldarg.1
IL_0001: ldc.i4.1

IL_0002: bge.s IL_0006
IL_0004: ldarg.2

IL_0005: ret
IL_0006: ldarg.0

IL_0007: ldarg.1
IL_0008: ldc.i4.1

IL_0009: sub
IL_000a: ldarg.1
IL_000b: ldarg.2
IL_000c: mul
IL_000d: tail.
IL_000e: call instance int32
Class1::factorial(int32,int32)
IL_0012: ret
} // end of method Class1::factorial

The second is a method that discards the stack (tail.) . If I am not
doing something wrong there is a big drop of the speed when I added the
tail. command.

514 ms (with tailcall) / 77 ms (without tailcall).

I cannot understand this ... Can anyone provide any helpful insight ?

Regards,
Tasos
Barry Kelly
5/17/2006 2:35:41 AM
[quoted text, click to view]

It might make sense via a compiler switch. .NET modules(1) can be
compiled separately and linked together later to form an assembly, so
any such security attributes may be in a different module from one being
compiled to support tail.call, so just an attribute probably wouldn't do
the trick.

(1) See csc flags /target:module, /addmodule:<file list>

Tasos Vogiatzoglou
5/17/2006 6:14:17 AM
Wonderful :) This was wonderful ...
Barry Kelly
5/17/2006 2:01:37 PM
[quoted text, click to view]

I think it is not optimized because no mainstream language currently
uses it. It certainly is implemented in so far as the stack does not
grow when you use tail. call to jump to the start of a method.

[quoted text, click to view]

The fact is, tail. call is not optimized via JIT to a jump currently.
And when you try to debug all this under VS 2005, it lies to you about
the code!

I implemented a similar piece of code to you. I started with this:

---8<---
using System;

class App
{
static double ArithmeticSum(int number, double result)
{
if (number == 0)
return result;
return ArithmeticSum(number - 1, number + result);
}

static void Main()
{
double result = 0;
for (int i = 0; i < 10000; ++i)
result = ArithmeticSum(10000, 1);
Console.WriteLine(result);
}
}
--->8---

I disassembled with ildasm, and rewrote ArithmeticSum to use tail. call:

---8<---
// ...
IL_000c: ldarg.1
IL_000d: stloc.0
IL_000e: br.s exit
// ...
IL_0015: ldarg.1
IL_0016: add

tail. call float64 App::ArithmeticSum(int32, float64)
ret

exit:
ldloc.0
ret
} // end of method App::ArithmeticSum
--->8---

I reassembled with ilasm /debug=OPT and, like you, I found that the
tail-call version was much slower. So, I started "devenv /debugexe
Test.exe", and stepped into the code.

This is what VS says about the JIT compiled code in the disassembly
window:

---8<---
IL_0000: nop
00000000 push ebp
00000001 mov ebp,esp
00000003 push edi
00000004 push esi
00000005 push ebx
00000006 push eax
00000007 fld qword ptr [ebp+8]
IL_0001: ldarg.0
0000000a test ecx,ecx
0000000c setne al
0000000f movzx eax,al
IL_0009: ldloc.1
00000012 test eax,eax
00000014 jne 00000018

IL_000c: ldarg.1
00000016 jmp 00000039

IL_0010: ldarg.0
00000018 mov dword ptr [ebp-10h],ecx
0000001b fild dword ptr [ebp-10h]
0000001e faddp st(1),st
00000020 sub esp,8
00000023 fstp qword ptr [esp]
00000026 dec ecx
00000027 mov eax,dword ptr ds:[00923028h]
0000002d push 2
0000002f push 2
00000031 push 1
00000033 push eax
00000034 call 791B69B0 // <------- NOTE
00000039 pop ecx

exit:
ldloc.0
0000003a pop ebx
0000003b pop esi
0000003c pop edi
0000003d pop ebp
0000003e ret 8
--->8---

The important bit to note is the call to 791B69B0. Even in VS 2005 mixed
mode debugging, it won't let you step into this code. When you try to
step into it, the instruction pointer jumps back to the start of the
method - effectively the call is implementing the tail call, but VS is
"helpfully" hiding the details.

So, its time to crack open SOS: In immediate window:

---8<---
..load sos
!u 791B69B0
--->8---

And I got this:

---8<---
Unmanaged code
791B69B0 F8 clc
791B69B1 ?? db ffh
791B69B2 ?? db ffh
791B69B3 FF0400 inc dword ptr [eax+eax]
791B69B6 0000 add byte ptr [eax],al
791B69B8 0100 add dword ptr [eax],eax
791B69BA 0000 add byte ptr [eax],al
791B69BC 0000 add byte ptr [eax],al
791B69BE 0C02 or al,2
791B69C0 1000 adc byte ptr [eax],al
--->8---

There's something fishy going on here: this code isn't meaningfully
executable!

So, I looked up my dear friend App.ArithmeticSum:

---8<---
!name2ee Test.exe App.ArithmeticSum
Module: 00922c14 (Test.exe)
Token: 0x06000001
MethodDesc: 00922fd8
Name: App.ArithmeticSum(Int32, Double)
JITTED Code Address: 00de0100

!u 00de0100
Normal JIT generated code
App.ArithmeticSum(Int32, Double)
Begin 00de0100, size 41
[quoted text, click to view]
00DE0101 8BEC mov ebp,esp
00DE0103 57 push edi
00DE0104 56 push esi
00DE0105 53 push ebx
00DE0106 50 push eax
00DE0107 DD4508 fld qword ptr [ebp+8]
00DE010A 85C9 test ecx,ecx
00DE010C 0F95C0 setne al
00DE010F 0FB6C0 movzx eax,al
00DE0112 85C0 test eax,eax
00DE0114 7502 jne 00DE0118
00DE0116 EB21 jmp 00DE0139
00DE0118 894DF0 mov dword ptr [ebp-10h],ecx
00DE011B DB45F0 fild dword ptr [ebp-10h]
00DE011E DEC1 faddp st(1),st
00DE0120 83EC08 sub esp,8
00DE0123 DD1C24 fstp qword ptr [esp]
00DE0126 49 dec ecx
00DE0127 8B0528309200 mov eax,dword ptr ds:[00923028h]
00DE012D 6A02 push 2
00DE012F 6A02 push 2
00DE0131 6A01 push 1
00DE0133 50 push eax
00DE0134 E877691B79 call 79F96AB0 (JitHelp:
CORINFO_HELP_TAILCALL)
00DE0139 59 pop ecx
00DE013A 5B pop ebx
00DE013B 5E pop esi
00DE013C 5F pop edi
00DE013D 5D pop ebp
00DE013E C20800 ret 8
--->8---

Here, note the LIE THAT VISUAL STUDIO TOLD!

---8<---
call 79F96AB0 (JitHelp: CORINFO_HELP_TAILCALL)
--->8---

This address, 79F96AB0, is different from the one in VS's most excellent
diassembly view, 791B69B0.

A peek inside this method (which, as we can see from timings, must be
quite expensive and is thus probably pretty complex):

---8<---
!u 79F96AB0
Unmanaged code
79F96AB0 FF155012387A call dword ptr ds:[7A381250h]
79F96AB6 50 push eax
79F96AB7 51 push ecx
79F96AB8 52 push edx
79F96AB9 F6058C44397AFF test byte ptr ds:[7A39448Ch],0FFh
79F96AC0 7409 je 79F96ACB
79F96AC2 F740045F000000 test dword ptr [eax+4],5Fh
79F96AC9 7422 je 79F96AED
79F96ACB 68DDDDDDDD push 0DDDDDDDDh
79F96AD0 68CCCCCCCC push 0CCCCCCCCh--->8---
--->8---

Barry Kelly
5/17/2006 3:23:01 PM
[quoted text, click to view]

This page hints at some reasons:

http://64.233.183.104/search?q=cache:r3DpWXcJOo0J:beta.blogs.msdn.com/shrib/atom.aspx+JIT_TailCall&hl=en&ct=clnk&cd=2

AddThis Social Bookmark Button