harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Egor Pasko <egor.pa...@gmail.com>
Subject Re: [drlvm][jit][opt] HARMONY-3243 It's possible to fill array with constant much faster
Date Tue, 20 Mar 2007 16:22:34 GMT
On the 0x29F day of Apache Harmony Alex Astapchuk wrote:
> Hi Egor,
> 
> Thank you for your reply. Please find my answers inlined.

[snip]

> >>>> Such short loops take just dozens of clockticks to finish. While the
> >>>> thread suspension involves much more heavy code sequence including spin
> >>>> loops, waiting on events and context switches. They are just not
> >>>> comparable in time.
> >>>> Please correct me if my estimates are wrong.
> >>> Alex, let's then put a limit of 80 (or
> >>> MAX_ALLOWED_CHARS_FOR_ARRAY_FILL) chars to this optimization. There is
> >>> no such limit currently.
> >> Sorry, did not get it. For what reason?
> > fillArray may gently touch a long loop. This has a negative effect on
> > suspension time.
> 
> 'has'?
> Do you have any estimate on what you call 'negative effect', please?
> 
> Ok, back to the reality :-)
> Filling a string of 65K 16bit chars (which is much longer than the real
> strings) takes about 0.00001s even on a slow Pentium *M* / 1.7 GHz.
> This is far from the 'effect'.
> The time is somehow O(n) - filling of a string with 65M chars
> (which in fact is 128M bytes) takes 0.14s.
> On powerful server-class boxes the numbers will be even lesser.
> 
> Taking even these numbers into account, the effect is near-zero.
> On an app playing with 128Mb thingies, the thread suspension time is
> far from dominating or even significant. It will spend much more on
> allocations, collections, accesses to these, etc.

OK, let's do arithmetic

How large can be a Java array? let's take arrays of long. 2^31 * 8
bytes, which is about 17 G. if 65K takes 10^(-5) s to fill, then 17 G
take something about 2.6 s.

Copying large amounts of data are somewhat more slower than isolated
small data copying (that could probably fit in cache and made no TLB
misses, swap (oh, swapping is an interesting area of investigation, btw))

A bit of extreme, of course..

> >>>>> do not kill each other. And tuning parameters for geneting algorithm
> >>>>> to choose non-obvious heuristics.
> >>>>>
> >>>>>> Throwing away the BBP is one of the goals for the optimization...
> >>>>> loop finiteness is easily detected in this case. And I would prefer
> >>>>> more general algorithms that optimize a variety of situations. This
> >>>>> specific optimiztion detects only one specific case, has a lot of
code
> >>>>> and not properly tested. Are we still targeting on stability?
> >>>> Hmmmm.... Could you please be more specific? 'lot of code' is kind of
> >>>> emotional characteristic. :-)
> >>>>
> >>>> What makes you think it was not tested properly?
> >>> no tests in JIRA
> >> There are many other tests performed during a feature preparation.
> >> These include 'build test', mini- and other benchmarks and other apps.
> >>
> >> Also, as you see your quick-look presumption about missed OutOfBounds
> >> check was also incorrect - all the existing machinery is still on place.
> >> But if you have any suggestion on better testing in general, then it's
> >> always welcome.
> > I have a suggestion to reflect testing efforts in JIRA Reasonable? I
> > think, yes.  Unit tests al ALWAYS welcome. Specific corner cases are
> > always welcome to test. I doubt 'build test' and other benchmarks and
> > other apps test fillArray in a good way. If somebody invented new
> > microbenchmarks to test on, they should be in JIRA.
> > Makes sense?
> 
> Yes, all of these sounds reasonable.
> I was just mistakenly dazzled by the categorical claim that the
> optimization 'was not tested properly' - sorry about that.

thank you, I am sorry for any inconvenience

[snip]

> >>> 2. did you measure the actual difference?
> >> YES!
> >> Reading of JIRA points out few times speedup for the micro benchmark.
> >> As it's common for micro benchmarks, YMMV on real apps depending on
> >> characteristics of workload.
> > To be specific. Did you measure long encoding against short encoding
> > with all the rest unchanged? If yes and the difference was
> > sugnificant, then cool. That was the question.
> 
> Aha, I see now.
> 
> Unfortunately, what your asking about is hard to measure directly.
> What you call 'long encoding against short encoding' in fact means
> replacement of 16 bit operations with 32 bit ones (and I personally
> would stuck to directly compare these).
> 
> We did not start implementation with just a hope 'it will help 'cause
> it's better'.
> We did measures first to find out the sources of inefficiency.
> 
> Among others, VTune showed significant stalls due to LCP in our code.
> The reworked code shows zero LCP stalls and works much faster.
> I would presume that most of the speedup comes from the better
> instructions. Our experiments showed pretty low boost from BBP removal
> itself (if I'm not mistaken, it was about 30%, but not 'times').
> Unrolling games gave almost nothing.
> 
> Anyway, a simple switch from the 16 bit operation to 32 bits gives
> roughly 2 times speedup on micro benchmarks (for instance from 1.98s to
> 1.00s on the same PentiumM as above), though again, I would not compare
> these numbers directly.

Great! thanks! it is both interesting and cool!
processors suck
Is it possible to always use short encodings? or it has negative sides?

[snip]

> >> Did you said 'design'? How comes that a single simple optimization pass
> >> affects a whole Jitrino's design? You must be kidding...
> >>
> >> Could you please be more specific on what you are talking about?
> > Careful reading can unveil it. I pointed out on:
> > 1. extra helper (one exptra helper does not make a design very
> >    complivated, but I would generally avoid doing unnecessary things)
> 
> Hmm... Just to make it clear: in no way this is or become a part of
> Jitrino's design.
> This is implementation of the particular optimization pass.
> 
> You can find the Jitrino *design* described here:
> http://harmony.apache.org/subcomponents/drlvm/JIT.html

OK, I mistakenly applied the term 'design' in place, where 'general
code readability' should have been mentioned. My bad.

New helper and two passes dependant on each other plus other
optimizations having no info on them makes the code harder to maintain
(or pick best combination of passes in genetic algorithm). That was my
point.

> > 2. this optimization has non-obvios side effects on BBP and
> 
> Let me presume that with 'BPP' here you really mean 'thread suspension'.
> Then it has near-zero effect - please see numbers above.

see above, please

> >    unrolling. (now we seem to agreed that they complement each other
> >    most of the time)
> > they affect the design. Just for reference. Not for endless
> > discussion.
> >
> >> All these 'more complicated', etc are too loosely. If you have any
> >> particular suggestion it would be nice to discuss, rather than talking
> >> about abstract general wordings. Or it would be even better to see a
> >> patch on what you think need an update and why.
> > suggestions on testing can be seen above
> >
> >> And sorry - I did not get about the bugs.
> >> Bugs can be easily fixed (if any) - what's the problem with it?
> > 1. sometimes not easily
> > 2. our first priority is to become more stable
> 
> You cares about stability much, which itself is a good thing.

You mean, too much? Yes, probably.

> >>>> Surely, it would be just wonderful to have all optimizations at once.
> >>>>
> >>>> I really mean 'all' - 'every single possible optimization'.
> >>>> It would really great.
> >>> yep
> >>>
> >>>> Thought the software development (as you probably know ;-) ) is
> >>>> iterative and evolutionary process.
> >>> there are evolutions and evolutions
> >>> I am not trying to convince anyone to stop evolution, no. I wanted to
> >>> emphasize that we can look behind and see that another approach would
> >>> (probably) have been better and almost would have been almost as hard
> >> In this 'probably better' the key is 'probably', and 'better' is far
> >> from proven. :-)
> >> Luckily, we still have a chance to see this 'probably' with all the
> >> highest standards, and even compare it with the existing things.
> > Seems like endless. Alex, you want to avoid technical discussions?
> 
> I'm really sorry if something in my posts made you think so.
> In fact, I was constantly asking to provide *the technical* facts or
> estimates rather than emotional reactions, but it's all my fault if you
> threated my words opposite. :-(

Thanks.

> Anyway, this thread become really hard to read. If you still think there
> is anything to discuss, then could you please post another topic?

excuse me, I use Gnus, which is a great tool and often I see no
problem in large emails when it turns into disaster for others.

Now I observe that discussion almost converged. Just thread
suspension to make clear..

-- 
Egor Pasko


Mime
View raw message