harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Astapchuk <alex.astapc...@gmail.com>
Subject Re: [drlvm][jit][ia-32]register-based fast calling convention
Date Tue, 21 Nov 2006 09:44:27 GMT
Rana Dasgupta wrote:
> Thanks for the clarifications Alex.
> On 11/16/06, Alex Astapchuk <alex.astapchuk@gmail.com> wrote:
>> Rana Dasgupta wrote:
>> >>
>> >>  So does this mean one specific convention, fastcall, for C helpers 
>> and
>> a
>> >> second custom DRLVM convention for managed code?
>> >Right.
>> >I'm going to implement both - the IA-32 fastcall and introduce another
>> >one convention.
>> >The fastcall is indeed *primarily targeted* to C-based helpers - this is
>> >most easy way to declare a function as '__fastcall' and let compiler do
>> >the rest of job.
> I see, so complete __fastcall support then...return in EDX:EAX , preserve
> EDI, ESI, EBP, EBX etc. and leave it to the compiler, makes sense.
>> Despite of its target, the fastcall still can be used for managed code
>> >if we find it productive.
>> The reason behind the 'custom' convention is that I'm going to make it
>> >tunable - to see how it fits into different workloads.
>> >The parameters that I'm going to make changeable are: number of GP
>> >registers for args, number of XMM registers for args, number of
>> >callee-save XMMs.
> Tunable only during experimentation, or expose a tunable knob and/or
> multiple annotation choices? One thing to remember is that the existing

I was thinking about command-line switches, to set the parameters.

Annotations might be next or even parallel step, but slightly unrelated 
to this particular implementation: currently there is no way to guess 
which calling convention is used by a VM helper except of looking 
through the source code. We already had come mess with it in the past, 
and we may expect more in the future. I suppose the Mikhail's work on 
helper's framework will help us to address the helpers info issue.

> compiler calling conventions have been arrived at in almost exactly the 
> same
> way....trying various options across a broad range of applications and
> choosing the best ones. So some of this work has been done upfront for us.

Though the fastcall (together with sdtcall/cdecl) were introduced before 
the Java grew in its popularity and long before the SSE came to the market.
So they may fit well for most apps, while we could find even a better 
fit for our particular needs.

> Also a thing to note is that on x64 ( at least on Windows ) there is a
> single __fastcall convention ...almost identical to the ABI. A single,
> efficient convention may sound limiting, but is great for debuggability  
> for
> example.


>> >> Passing a bounded number of fp args using XMM sounds like a good idea,
>> but
>> >> why callee-saves XMM's? My recollection is that the Intel Software
>> >> Development Manual recommends caller saved SSE and SSE2 registers for
>> >> performance. Primarily because there are all kinds of optimized move
>> >> instructions to and from XMM registers like MOVAPS, MOVUPS, MOVAPD,
>> >> etc.  for packed/unpacked, single/double precision fp types. The 
>> callee
>> >> does not know the datatype in a register. The caller can save only 
>> what
>> it wants
>> >> to preserve, using the best move. My recollection is that the 
>> unaligned
>> >> move  penalties are high.
>> >The optimization guide recommends on the very generic case.
>> >In a program that mixes all the wealth of SSE/SSE2 the guide
>> >recommendations may be the best choice.
>> >In our particular case, we completely control the managed code and its
>> >behavior so we may play with more fine grained control.
>> >For example, we're currently neither use packed things, nor we do
>> >anything with 128bits. So we may relax requirement to preserve only
>> >lower 64 bits - even the simple MOVQ should fit well.
> We don't yet have a good grasp of all the application types we are dealing
> with. Remember that codegen for some well known benchmarks may not provide
> all the data. However, MOVQ for the lower 64 is reasonable to start with, I
> agree.
>> The caller knows the type, but the callee knows whether it changes a
>> >particular register - the main reason to play with callee-save XMMs is
>> >*to avoid the need for saving at all*.
>> >Currently, the FP-intensive code must spill every used XMM register,
>> >before a call, even if the XMMs registers are not touched in the callee.
>> >This is what we would like to avoid - the unnecessary spill code and
>> >memory accesses.
>> >Also, I'm going to make this parameter (number of callee-save XMM
>> >registers) tunable. If find it hurts anything, we'll switch it off.
> One could also come up with a reverse argument...the caller needs no state
> to be preserved ( it already saves the parameter XMM registers anyway ) and
> the callee does a lot of unnecessary work :-) But making this tunable is a
> good idea....till we know.
>>> I  did not fully understand your comment about the resolve_interface()
>> >> helper. In the custom convention(2), is the proposal for all XMM
>> registers
>> >> to be saved by the callee, even if there are no fp operands in the
>> method?
>> >Sorry for not being clear.
>> >Actually, the proposal is exactly opposite. :-)
> :-) Good, thanks.

View raw message