harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ian Rogers" <rogers.em...@gmail.com>
Subject Re: [drlvm][jitrino][test] Large contribution of reliability test cases for DRLVM+Dacapo
Date Tue, 02 Dec 2008 13:10:53 GMT
2008/12/2 Aleksey Shipilev <aleksey.shipilev@gmail.com>:
> Hi, Egor!
> Your thoughts are truly pessimistic like everyone who develop at least
> one compiler has. Of course, there's no silver bullet, e.g. there's no
> such system where you can press the big red button and the system will
> say where're the bugs :)
> The whole thing about that fuzzy testing is:
>  a. Yes, there can be false-positives.
>  b. Yes, there can be plenty of false-positives.
>  c. Somewhere behind the stack there are real issues covered.
> The problem is, no matter what we are thinking about automated testing
> of compiler, any testing results would produce nearly the same amount
> of garbage above the real issues.
> You'll make the random search, you'll have the whole search space to
> track: 200 boolean params effectively produce 2^200 possible tuples.
> What these results are for, they are more focused on near-optimal
> configurations and so we needn't to scratch our heads on whether we
> should take care of configuration that lies far away from optimal.
> Again, there can be lots of garbage in those tests, but 5.400+ is the
> number I could live with, not with the 2^200. Having only this little
> of the tests enables me to actually tackle them, without having
> another young-looking Universe to run that tests in. <g>
> But the discussion is really inspiring, thanks! The point of
> contributing those tests were the impression that JIT developers are
> crying for tests and bugs to fix. Ian Rogers from JikesRVM had asked
> me to contribute the failure reports for JikesRVM, solely for testing
> of deep dark corners of RVM, so I extrapolated the same intention on
> Harmony. I for sure underestimated the failure rate for RVM and
> Harmony and now have to think how to make the worth of that pile of
> crashed configurations. For now on, I just disclosed them to community
> without clear thoughts what to do next. Nevertheless, in the
> background we all thinking what to do.
> Please don't take the offense :) I perfectly know the tests have to go
> for human-assisted post-processing, I know there is a lot of garbage,
> I know there are lots of implications and complications around. I also
> suspect that this kind of work is like running ahead the train. But
> anyway, the work is done, it was an auxiliary result so we can just
> dump it -- but can we make any use of it?
> There's an excellent idea with re-testing that issues in debug mode,
> to make more clear taxonomy of the crashes. Though it's not related to
> my job and thesis anymore, I also have an idea how to sweep the tests
> and make them more fine-grained, by introducing the similarity metric
> and searching for nearest non-failure configuration. Any other ideas?
> Thanks a lot,
> Aleksey.

A particular tale of sorrow from Jikes RVM was that a bug crept into
our SSA code due to naive loop unrolling masking the bug. When we
later experimented with no loop unrolling, to benefit loop versioning,
we found SSA was broken. The bug is sufficiently subtle that we
haven't yet been able to create a unit test. We're slowly introducing
optimizations back into our O2 set once we're confident they are
robust. The nice thing is this gives us scope to improve our


> On Tue, Dec 2, 2008 at 12:37 PM, Egor Pasko <egor.pasko@gmail.com> wrote:
>> On the 0x50E day of Apache Harmony Aleksey Shipilev wrote:
>>> Hi, Egor!
>>> I will disclose the methodology a little later. If you're interested
>>> what options were swizzled (I had selected them by hand, m.b. lost
>>> something), please look into one of that emconfs, there're plenty of
>>> options in the end of the file.
>>> Nevertheless there're compiler failures during the swizzling, even if
>>> the configuration produced is bad, the compiler should tell me about,
>>> but not crash voluntary :)
>> not quite possible.
>> There is no pre-analyser for optimization passes, nor there is
>> something in each optpass to analyze adjacent optpasses. The best each
>> optpass can do is to detect that IR is not well-formed. We cannot do
>> plain well-formedness checks each time due to performance reasons, and
>> often the fastest way to find an error is to start optimizing and hit
>> an assertion in code. We do that. There are many downsides of this
>> approach, one of them is that you have to compile a thousand methods
>> to detect some obvious configuration inconvenience just because most
>> methods are not triggering anything interesting.
>> There are a bunch of asserts to show incorrectness of IR (in debug
>> mode, for performance reasons, sorry).
>>> If I had those clues during the search, I would constrain the search
>>> with those boundaries, but the compiler just crashes. So even some
>>> failures are right, they need to be documented and some meaningful
>>> message thrown instead of crash.
>> What if there is no algorithm to find if an arbitrary optimization
>> pass is correct in terms of Java semantics? I am not sure it is the
>> case, but seems like a hard problem in general.
>> In my opinion some issues can be clearly documented (and should be!)
>> to avoid simple inconveniences, while others cannot. It is a hard
>> problem not only to implement this verifying algorithm, but rather to
>> support it with changing optimization passes.
>>> That way I think every emconf is worth reviewing.
>> I would say it is worth reviewing if there is a certain level of
>> effectiveness in process. Say, no less than 50% of emconf files hit
>> into a bug in an optimization.
>> We may jump into analysis of emconf failures with the process like
>> this: If there is a misconfiguration, add a guarding rule (works as
>> documentation), if a bug in an optpass, fix it. Although it might seem
>> the right way to go, it could also lead to thousands of exception
>> rules without a sound of convergence. Do we need such rules? Do we
>> want to spend time on gaining this abstract knowledge?
>>> Thanks,
>>> Aleksey.
>>> On Tue, Dec 2, 2008 at 11:21 AM, Egor Pasko <egor.pasko@gmail.com> wrote:
>>>> On the 0x506 day of Apache Harmony Aleksey Shipilev wrote:
>>>>> Hi,
>>>>> I had already done the same thing for JikesRVM [1] and now the time
>>>>> for Harmony has come.
>>>>> As the part of my MSc thesis I had used GA to swizzle the JIT
>>>>> configuration for DRLVM in search of optimal one for running
>>>>> DaCapo/SciMark2
>>>>> benchmarks. While the performance data is re-verified (there are
>>>>> preliminary +10% on some sub-benchmarks, btw), I had parsed the
>>>>> failure logs and this gives me 5.700+ emconfs [2] on which
>>>>> DRLVM/Jitrino is failing.
>>>>> The thing that makes those reports really interesting, is that most of
>>>>> the configurations tested lies near local maxima of performance due to
>>>>> the nature of search. That makes the tests more valuable as they test
>>>>> possible near-optimal configurations.
>>>>> If someone interested in those and wishes to hear more info on the
>>>>> reports, please don't hesitate to ask :)
>>>>> I would eventually elaborate on some of these crashes, but not in the
>>>>> nearest future.
>>>> Aleksey, great work! (at least on some sub-benchmarks, btw:)
>>>> Generally I am a bit skeptical about the effectieness of performing
>>>> analysis these failures. It would be interesting to read about your
>>>> methodology, i.e. did you put some constraints by hand to avoid
>>>> failures that are expected by design? An example: if you happen to not
>>>> put ssa/dessa in the right place (ssa before optimizations that
>>>> require SSA form, dessa after optimization passes that require no
>>>> SSA), you get a JIT failure.
>>>> The sad story is that there are many such "by design" pecularities,
>>>> many undocumented, many hard to discover.
>>>> --
>>>> Egor Pasko
>> --
>> Egor Pasko

View raw message