harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Astapchuk <alex.astapc...@gmail.com>
Subject Re: [drlvm][jit][ia-32]register-based fast calling convention
Date Fri, 17 Nov 2006 03:57:57 GMT
Hi Rana,

Thank you for your comments. Please, find my answers inlined.

Rana Dasgupta wrote:
> Hi Alex,
>    This is good, thanks. Please see below...
> On 11/15/06, Alex Astapchuk <alex.astapchuk@gmail.com > wrote:
>> >Hi all,
>> >Among other things listed on the JIT Dev tasks, there is a need for
>> >calling convention (CC) fix-up for IA-32 [1].
>> >Current problems are:
>> >1. The calling convention(s) used are stack-based - this adds a memory
>> >access overhead on calls.
>> >2. The convention currently used for managed code neither allow to pass
>> >float-point values on XMM registers, nor it provides callee-saved XMM
>> >registers.
>> >3. FPU stack is used to return float/double values
>> So, I'm going to implement register-based calling convention for IA-32.
>> >The current proposal is:
>> >    - make it possible to switch between existing and new conventions
>> >       for investigation and tuning purposes
>  So does this mean one specific convention, fastcall, for C helpers and a
> second custom DRLVM convention for managed code?

I'm going to implement both - the IA-32 fastcall and introduce another 
one convention.
The fastcall is indeed *primarily targeted* to C-based helpers - this is 
most easy way to declare a function as '__fastcall' and let compiler do 
the rest of job.
Despite of its target, the fastcall still can be used for managed code 
if we find it productive.

The reason behind the 'custom' convention is that I'm going to make it 
tunable - to see how it fits into different workloads.

The parameters that I'm going to make changeable are: number of GP 
registers for args, number of XMM registers for args, number of 
callee-save XMMs.

>>    - implement 2 calling conventions:
>> >       1. well known standard fastcall (fisrt 2 params on ECX+EDX, the
>> >       rest is on stack)
>> >       2. DRLVM-specific convention: which involves ECX, EDX (and may
>> >       be EAX) for integer/parameters passing and also use XMMs for
>> >       float-point parameters and produce callee-save XMMs.
> Passing a bounded number of fp args using XMM sounds like a good idea, but
> why callee-saves XMM's? My recollection is that the Intel Software
> Development Manual recommends caller saved SSE and SSE2 registers for
> performance. Primarily because there are all kinds of optimized move
> instructions to and from XMM registers like MOVAPS, MOVUPS, MOVAPD, MOVDQA
> etc.  for packed/unpacked, single/double precision fp types. The callee 
> does not know the datatype in a register. The caller can save only what it wants
> to preserve, using the best move. My recollection is that the unaligned 
> move  penalties are high.

The optimization guide recommends on the very generic case.
In a program that mixes all the wealth of SSE/SSE2 the guide 
recommendations may be the best choice.

In our particular case, we completely control the managed code and its 
behavior so we may play with more fine grained control.
For example, we're currently neither use packed things, nor we do 
anything with 128bits. So we may relax requirement to preserve only 
lower 64 bits - even the simple MOVQ should fit well.

The caller knows the type, but the callee knows whether it changes a 
particular register - the main reason to play with callee-save XMMs is 
*to avoid the need for saving at all*.

Currently, the FP-intensive code must spill every used XMM register, 
before a call, even if the XMMs registers are not touched in the callee.

This is what we would like to avoid - the unnecessary spill code and 
memory accesses.

Also, I'm going to make this parameter (number of callee-save XMM 
registers) tunable. If find it hurts anything, we'll switch it off.

> I  did not fully understand your comment about the resolve_interface()
> helper. In the custom convention(2), is the proposal for all XMM registers
> to be saved by the callee, even if there are no fp operands in the method?

Sorry for not being clear.
Actually, the proposal is exactly opposite. :-)

I mentioned resolve_interface() as the example of code where the XMMs 
[most likely] are not touched so there is no need to spill them.


View raw message