harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aleksey Shipilev" <aleksey.shipi...@gmail.com>
Subject Re: [drlvm][jitrino][test] Large contribution of reliability test cases for DRLVM+Dacapo
Date Tue, 02 Dec 2008 12:23:45 GMT
Hi, Egor!

Your thoughts are truly pessimistic like everyone who develop at least
one compiler has. Of course, there's no silver bullet, e.g. there's no
such system where you can press the big red button and the system will
say where're the bugs :)

The whole thing about that fuzzy testing is:
 a. Yes, there can be false-positives.
 b. Yes, there can be plenty of false-positives.
 c. Somewhere behind the stack there are real issues covered.

The problem is, no matter what we are thinking about automated testing
of compiler, any testing results would produce nearly the same amount
of garbage above the real issues.

You'll make the random search, you'll have the whole search space to
track: 200 boolean params effectively produce 2^200 possible tuples.
What these results are for, they are more focused on near-optimal
configurations and so we needn't to scratch our heads on whether we
should take care of configuration that lies far away from optimal.
Again, there can be lots of garbage in those tests, but 5.400+ is the
number I could live with, not with the 2^200. Having only this little
of the tests enables me to actually tackle them, without having
another young-looking Universe to run that tests in. <g>

But the discussion is really inspiring, thanks! The point of
contributing those tests were the impression that JIT developers are
crying for tests and bugs to fix. Ian Rogers from JikesRVM had asked
me to contribute the failure reports for JikesRVM, solely for testing
of deep dark corners of RVM, so I extrapolated the same intention on
Harmony. I for sure underestimated the failure rate for RVM and
Harmony and now have to think how to make the worth of that pile of
crashed configurations. For now on, I just disclosed them to community
without clear thoughts what to do next. Nevertheless, in the
background we all thinking what to do.

Please don't take the offense :) I perfectly know the tests have to go
for human-assisted post-processing, I know there is a lot of garbage,
I know there are lots of implications and complications around. I also
suspect that this kind of work is like running ahead the train. But
anyway, the work is done, it was an auxiliary result so we can just
dump it -- but can we make any use of it?

There's an excellent idea with re-testing that issues in debug mode,
to make more clear taxonomy of the crashes. Though it's not related to
my job and thesis anymore, I also have an idea how to sweep the tests
and make them more fine-grained, by introducing the similarity metric
and searching for nearest non-failure configuration. Any other ideas?

Thanks a lot,
Aleksey.

On Tue, Dec 2, 2008 at 12:37 PM, Egor Pasko <egor.pasko@gmail.com> wrote:
> On the 0x50E day of Apache Harmony Aleksey Shipilev wrote:
>> Hi, Egor!
>>
>> I will disclose the methodology a little later. If you're interested
>> what options were swizzled (I had selected them by hand, m.b. lost
>> something), please look into one of that emconfs, there're plenty of
>> options in the end of the file.
>>
>> Nevertheless there're compiler failures during the swizzling, even if
>> the configuration produced is bad, the compiler should tell me about,
>> but not crash voluntary :)
>
> not quite possible.
>
> There is no pre-analyser for optimization passes, nor there is
> something in each optpass to analyze adjacent optpasses. The best each
> optpass can do is to detect that IR is not well-formed. We cannot do
> plain well-formedness checks each time due to performance reasons, and
> often the fastest way to find an error is to start optimizing and hit
> an assertion in code. We do that. There are many downsides of this
> approach, one of them is that you have to compile a thousand methods
> to detect some obvious configuration inconvenience just because most
> methods are not triggering anything interesting.
>
> There are a bunch of asserts to show incorrectness of IR (in debug
> mode, for performance reasons, sorry).
>
>> If I had those clues during the search, I would constrain the search
>> with those boundaries, but the compiler just crashes. So even some
>> failures are right, they need to be documented and some meaningful
>> message thrown instead of crash.
>
> What if there is no algorithm to find if an arbitrary optimization
> pass is correct in terms of Java semantics? I am not sure it is the
> case, but seems like a hard problem in general.
>
> In my opinion some issues can be clearly documented (and should be!)
> to avoid simple inconveniences, while others cannot. It is a hard
> problem not only to implement this verifying algorithm, but rather to
> support it with changing optimization passes.
>
>> That way I think every emconf is worth reviewing.
>
> I would say it is worth reviewing if there is a certain level of
> effectiveness in process. Say, no less than 50% of emconf files hit
> into a bug in an optimization.
>
> We may jump into analysis of emconf failures with the process like
> this: If there is a misconfiguration, add a guarding rule (works as
> documentation), if a bug in an optpass, fix it. Although it might seem
> the right way to go, it could also lead to thousands of exception
> rules without a sound of convergence. Do we need such rules? Do we
> want to spend time on gaining this abstract knowledge?
>
>> Thanks,
>> Aleksey.
>>
>> On Tue, Dec 2, 2008 at 11:21 AM, Egor Pasko <egor.pasko@gmail.com> wrote:
>>> On the 0x506 day of Apache Harmony Aleksey Shipilev wrote:
>>>> Hi,
>>>>
>>>> I had already done the same thing for JikesRVM [1] and now the time
>>>> for Harmony has come.
>>>>
>>>> As the part of my MSc thesis I had used GA to swizzle the JIT
>>>> configuration for DRLVM in search of optimal one for running
>>>> DaCapo/SciMark2
>>>> benchmarks. While the performance data is re-verified (there are
>>>> preliminary +10% on some sub-benchmarks, btw), I had parsed the
>>>> failure logs and this gives me 5.700+ emconfs [2] on which
>>>> DRLVM/Jitrino is failing.
>>>>
>>>> The thing that makes those reports really interesting, is that most of
>>>> the configurations tested lies near local maxima of performance due to
>>>> the nature of search. That makes the tests more valuable as they test
>>>> possible near-optimal configurations.
>>>>
>>>> If someone interested in those and wishes to hear more info on the
>>>> reports, please don't hesitate to ask :)
>>>> I would eventually elaborate on some of these crashes, but not in the
>>>> nearest future.
>>>
>>> Aleksey, great work! (at least on some sub-benchmarks, btw:)
>>>
>>> Generally I am a bit skeptical about the effectieness of performing
>>> analysis these failures. It would be interesting to read about your
>>> methodology, i.e. did you put some constraints by hand to avoid
>>> failures that are expected by design? An example: if you happen to not
>>> put ssa/dessa in the right place (ssa before optimizations that
>>> require SSA form, dessa after optimization passes that require no
>>> SSA), you get a JIT failure.
>>>
>>> The sad story is that there are many such "by design" pecularities,
>>> many undocumented, many hard to discover.
>>>
>>> --
>>> Egor Pasko
>>>
>>>
>>
>
> --
> Egor Pasko
>
>

Mime
View raw message