harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Egor Pasko <egor.pa...@gmail.com>
Subject Re: [drlvm] interface call devirtualization
Date Fri, 14 Jul 2006 03:55:25 GMT
On the 0x1A5 day of Apache Harmony Rana Dasgupta wrote:
>  Nice benchmark. 

I am soo glad ;)

> Yes, the cost of not devirtualizing as well as not hoisting
> ldintfcvt is high. I played around a little with this too and have some
> comments and questions...
> 
>   First some high level stuff....
>   1) What are the instructions like ldintfcvt, ldvfnslot, etc.in the jit
> dump? Are these part of jitrino HIR? 

exactly! the High-level IR

let's look at these instructions in more detail:
  I28:ldintfcvt t18,cls:Intfc ((t19)) -) t21:vtb:cls:Intfc
  I29:ldvfnslot [t21.Intfc::inc] ((t19)) -) t22:method:inc

I28 and I29 are identifiers of instruction instances (this is an easy one:)

let's skip ((t19)) for simplicity (it shows explicit data
dependency to the corresponding null-check, usually represented as a
special operand in Jitrino.OPT)

*ldintfcvt* = Load Interface Vtable address
takes 2 parameters: (Object) and (Interface_name), 
returns: address of the virtual table 
(every object has a ref to it's interface vtables). 

You see, '-)' is an arrow :)

*ldvfnslot* = Load Virtual FuNction address SLOT
takes 2 parameters: (virtual table) and (item to search)
returns: address of the address of the method

*callimem* = Indirect Memory Call

> While they seem more or less readable,
> is there any doc describing them ... since they are the first level internal
> representation, and anyone who wants to work with the jit needs to
> understand them?

Yes, the doc would be great! But, there is no document describing HIR
commands at the moment. I am thinking of updating the sources with
self-documenting code. 

Up to now the easiest way to find out what an instruction does is to
look in the Opcode.cpp, where opcodeTable array is initialized.

we can find the entries for our 2 instructions:
    { Op_TauLdIntfcVTableAddr,  ..blablabla.. "ldintfcvt", 
        "ldintfcvt %0,%d ((%1)) -) %l",        }, 

    { Op_TauLdVirtFunAddrSlot,  ..blablabla.. "ldvfnslot"
         "ldvfnslot [%0.%d] ((%1)) -) %l",      },

skip that Tau for simplicity and you get a more-or-less self
descriptive names in the first column. What I am thinking of, is
adding an extra column and fill it with more descriptive comments that
could be printed on request, say ... -Xjit print_hir_doc

should not take too much time, I think. You can always ask what these
or those instructions mean, I'll try to explain

>   2) When experimenting with the JIT related command line options to
> ij.exe-Xjit...I found many of them listed in the vm/doc/GettingStarted
> guide...Just FYI for interested folks.

yeah, and many undocumented ones, specific to optimization
passes. They are easy to find in functions like readFlagsFromCommandLine

> On 10 Jul 2006 22:44:52 +0700, Egor Pasko <egor.pasko@gmail.com> wrote:
> > > I looked through the list of TODO projects for JIT [1] and
> > > decided to write a >microbenchmark detecting how good interface
> > > call devirtualization works in JIT >(see below)
> >
> > >Jitrino.OPT showed very-very slow (~2.5 times slower than JRockit (1.5
> > /linux)).
> >
> > > A strange thing, "interface call devirtualization"
> > >would have boosted JRockit's performance too (I checked that with a
> > >slightly changed benchmark).
> 
> 
>   Yes, this optimization would have helped here...I also converted this
> interface dispatch effectively to a virtual dispatch in your test and the
> performance significantly improves with the resultant devirtualization...
> 

yes, I can comment on your piece of dump

> Block L8:
>   Predecessors: L7
>   Successors: L11 L9
>   I74:L8:
>   I40:ldvtable  t13 ((t27)) -) t28:vtb:cls:IntfcImpl
>   I41:getvtable cls:IntfcImpl -) t29:vtb:cls:IntfcImpl
>   I42:if ceq.vtb t28, t29 goto L11
>   GOTO L9

this is an explicit condition equivalent to:
(if object t13 is of class IntfcImpl) 
// it is implemented via comparing corresponding virtual table addresses
So, this is where the guarded devirtualization of virtual calls shows itself.

> Block L9:
>   Predecessors: L8
>   Successors: L14 UNWIND
>   I37:L9:
>   I43:tauhastype      t13,cls:IntfcImpl -) t30:tau
>   I44:ldvfnslot [t28.IntfcImpl::inc] ((t27)) -) t31:method:inc
>   I46:callimem  [t31](t13) ((t27,t30)) -)
>   GOTO L14

This is a guard :)

> Block L11:
>   Predecessors: L8
>   Successors: L12 UNWIND
>   I38:L11:
>   I48:--- IntfcImpl::inc: ()
>   I49:chknull   t13 -) t32:tau
>   GOTO L12

this is a devirtualized way, you see "IntfcImpl::inc" is inlined here.
Inlining sometimes happens via a set of heuristics such as size of
inlined bytecode...

> >So, that would be interesting to implement it!
> >
> > >Seems like the best choice is to start from a couple of easy heuristics:
> > >* if there is only one loaded class to implement the interface, choose it
> > >* if there are more, choose the one with it's method invoked earlier
> > (compiled
> > >by some JIT, possibly, some other JIT),
> 
>   If we forget the profile guidance for now, could you please elaborate more
> about how we should do this and on what exactly is happening now? 

OK, just need to check my ideas. In brief, I see 2 places where to
devirtualize: in Translator and in High-Level Optimizer. Each has it's
own benefits. Translator is faster, but less heuristic-oriented (can
rely only on bytecode size).

> BTW, do we currently raise the IncompatibleClassChangeError if the
> objectref's class does not actually support the interface?

Yes, AFAIR, IncompatibleClassChangeError works, there should be a
special vtable entry on that. Not sure, I can check, if it helps.

> Do we cache the interface tables per class object and can we improve
> this cache search in the optimization? 

This is a kind of optimization in VM core. The original proposal was
to implement an optimization in Jitrino.OPT.

> In non trivial cases where many classes implement the same
> interface, the cache search may be more expensive than the slot look
> up.

hm, we could even store slots in objects, but this is not easy to do
:) performance mplact is not obvious in all cases :(

>   We could also virtualize and then devirtualize the interface invocation
> when we can...somewhat like the jit dump above. What do you think?

yes, that's right, make a virtual call (guarded) and then rely on
guarded devirtualizer. Good and simple. Thanks!

> > >* if we have many candidate methods that are compiled, choose the most
> > frequent
> > >one (need a method-entry profile, the feature is likely to stay untouched
> > for
> > >a while, I guess)
> >
> > > 3. Should I create a JIRA for the issue ASAP? :)
> 
>   I would say yes, let's create a JIRA issue

OK, TBD (sorry, I am soo.. busy today:)

-- 
Egor Pasko, Intel Managed Runtime Division


---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org


Mime
View raw message