harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mikhail Fursov" <mike.fur...@gmail.com>
Subject Re: [drlvm][jit][opt] HARMONY-3243 It's possible to fill array with constant much faster
Date Thu, 15 Mar 2007 20:26:27 GMT
On 15 Mar 2007 19:58:54 +0300, Egor Pasko <egor.pasko@gmail.com> wrote:
> this should hypothetically improve one simple code pattern (that is
> probably widely used?), i.e.: for (int i=0;i<I;i=++){A[i]=X;}
> What I figured out looking at the patch:
> * [pass.2] does not seem to throw any AIOutOfBoundsException
> * [pass.2] does not have any useful tuning parameters such as number
>            of unrolls per loop, thus the scheme eats potential benefit
>            from loop unrolling and does not give any tuning back

AFAIK this optimization is much more efficient then loop unrolling on
microtest you mentioned.

* [pass.1] detects such a rare pattern that I doubt it would benefit a
>            user (but obviously will benefit a you-know-which-benchmark
>            runner)

Generic Arrays.fill-like methods could be optimized this way.
+ IMO even several percents in widely known
benchmarks is a reason to implement even more complicated optimizations.

* [pass.1] has a lot of new code that introduces potential instability
>            (if the pattern was detected not properly, the code does
>            not read easily), but does not contain a single unit test
>            or the like. Together with AIOOBE issue stability becomes a
>            real question.

All known bugs can be fixed. If AIOOBE is the a real problem here - it looks
to be easily fixed too. The question if the optimization gives any benefit
or not.
We can move it into separate  HLO pass (and separate
file) and drop it from codebase if it's not needed in future.

* back branch polling is not performed (which is probably good for
>   performance, but I would better have a tuning option)

Do you think that the latency of mem-copying like opt can be a problem here?

What I can say more is that a good "ABCD" optimization complimented
> with "loop versioning" optimiztion will make a more readable, more
> stable code, AND will give a better performance gain (loop unrolling
> is awake too). Setting aside the fact that the overall design will be
> more straightforward (having no interdependent passes, extra helpers, etc)

So I vote for focusing on ABCD plus "loop versioning" and leaving
> specific benchmark-oriented tricks (complicating our design) alone.

I support focusing on loop
versioning/ABCD and other general purpose optimization we do not have today.
And until we do not get from these opts better results for your microtest we
can use Nikolay's approach. At least it's works better today.

Mikhail Fursov

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message