harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Astapchuk <alex.astapc...@gmail.com>
Subject Re: [drlvm][jit][opt] HARMONY-3243 It's possible to fill array with constant much faster
Date Fri, 16 Mar 2007 05:04:47 GMT
Hi Egor,

Thank you for your valuable comments. Please, find the answers inlined.

Egor Pasko wrote:
> JIT guys,
> 
> ...on the HARMONY-3243 patch I would like an open discussion in dev@
> rather than a limited one in JIRA.
> 
> first of all, the latest faf_em64t.patch crashes for me with the message:
> 
> java: /home/xxx/svn/1/trunk/working_vm/vm/jitrino/src/shared/LoopTree.h:85: bool Jitrino::LoopTree::hasLoops()
const: Assertion `isValid()' failed.
> SIGABRT in VM code.
> Stack trace:
> addr2line: '[vdso]': No such file
> (hangs here)

Could you please add the steps to reproduce into the JIRA?

> next, with all great respect to Nikolay Sidelnikov for his efforts I
> think, this idea needs a more careful analysis and probably
> reimplementation.
> 
> The story is: 2 new optimization passes introduced: one in High-level
> Optimizer nd one in Code Generator:
> pass.1 recognizes a simple loop initialization with a simple pattern,
>        substitutes it with a JIT helper call
> pass.2 takes the corresponding call and substitutes it with a
>        Low-Level-IR as it thinks the best code is
> 
> this should hypothetically improve one simple code pattern (that is
> probably widely used?), i.e.: for (int i=0;i<I;i=++){A[i]=X;}
> 
> What I figured out looking at the patch:
> 
> * [pass.2] does not seem to throw any AIOutOfBoundsException

It moves bound checks outside of the loop and does the 'fast' loop
only when the check guarantees that no outofbounds happens. Otherwise,
it falls into a 'regular' version.

> * [pass.2] does not have any useful tuning parameters such as number
>            of unrolls per loop, thus the scheme eats potential benefit
>            from loop unrolling and does not give any tuning back

The optimization is mostly orthogonal to the loop unrolling - see
also below.

> * [pass.1] detects such a rare pattern that I doubt it would benefit a
>            user (but obviously will benefit a you-know-which-benchmark
>            runner)

It quite common for many string intensive applications, including
XML-oriented.
I would suppose it's even more common than 'static array initialization'
pattern that is explicitly handled in the translator:
you may have a look into
jitrino/src/translator/java/JavaByteCodeTranslator.cpp,
methods JavaByteCodeTranslator::newarray(), checkForArrayInitializer().

> * [pass.1] has a lot of new code that introduces potential instability

Could you please clarify more about instability - what do you mean exactly?

>            (if the pattern was detected not properly, the code does
>            not read easily), but does not contain a single unit test
>            or the like. Together with AIOOBE issue stability becomes a
>            real question.
> 
> * back branch polling is not performed (which is probably good for
>   performance, but I would better have a tuning option)

BBP is useless (and even harmful) on short finite loops for various
reasons - it adds a memory access (a flag check), a branch, and a live
range. For hot short loops it's overkill.

Throwing away the BBP is one of the goals for the optimization...

> What I can say more is that a good "ABCD" optimization complimented
... and ABCD does nothing with BBP, isn't it?

> with "loop versioning" optimiztion will make a more readable, more
> stable code, AND will give a better performance gain (loop unrolling
Unfortunately no, it will not. :-(

As you probably know ;-), char is 16 bits wide in Java.
The code generated for char moves have to use 16 bit movement
instructions. These instructions include operand-size change prefix
(66h) that makes CPU decoder feel bad. Whatever unrolling or versioning
would leave these heavy instructions on place.
One of the goals was to throw the heavy instructions away and replace
them with more effective and fast ones. It's somehow hard to do in
codegen separately, but much easy (and clearly) with a little help from
the HLO side - this is rationale for the HLO pass.

> is awake too). Setting aside the fact that the overall design will be
> more straightforward (having no interdependent passes, extra helpers, etc)
> 
> So I vote for focusing on ABCD plus "loop versioning" and leaving 
> specific benchmark-oriented tricks (complicating our design) alone.

Again, the optimization is orthogonal to the ABCD and was never
positioned as replacement for ABCD. Optimization's main target are
string (and XML) intensive apps.

> An experienced hacker would say that all compiler reasearch is a bunch
> of hacks that influence each other in unexpected ways. Well, maybe,
> but I do not like this particular hack (with all respect to Nikolay
> for his efforts)

In no way this is a hack. If you see a better room for the improvements 
here, patches are always welcome. :-)

-- 
Yours faithfully,
      Alex


Mime
View raw message