On 12 Mar 2007 21:52:45 +0300, Egor Pasko <egor.pasko@gmail.com> wrote:
>
> On the 0x297 day of Apache Harmony Weldon Washburn wrote:
> > On 12 Mar 2007 19:46:06 +0300, Egor Pasko <egor.pasko@gmail.com> wrote:
> > >
> > > On the 0x297 day of Apache Harmony Weldon Washburn wrote:
> > > > All,
> > > > I assigned H3010 to myself. This test definitely demonstrates a bug
> > > that
> > > > needs fixing. But its not clear when this bug must be fixed. This
> > > really
> > > > brings forward a higher-level. What to code this bug right now and
> when
> > > > would this bug be moved to "blocker" status? I provide some
> > > observations to
> > > > start the discussion:
> > > >
> > > > 1)
> > > > The bug is a Stack Overflow Exception happens from inside fast
> native
> > > helper
> > > > functions. Fast native helpers do not setup the M2N stack frame
> which
> > > is
> > > > required to throw exceptions such as SOE. Adding M2N setup to fast
> > > native
> > > > helper will unacceptably slow down the system.
> > >
> > > to be honest..
> > >
> > > SOE can happen from a 'push' onto stack (such pushes are not
> > > safepoints in JIT currently). Thus, you cannot unwind properly (no M2N
> > > necessary for releasing the lock).
> > >
> > > Do you think it is a low probability?
> >
> >
> > Good point. Yes, SOE can happen from jitted code doing stuff like "push
> > ebp". And we have to handle this case properly. And it will require a
> > design discussion between JIT and VM developers. This is really
> interesting
> > topic. But the question remains. Do we have to solve this issue in Q1?
> > Q4? 2008?? To answer this question, we have to ask what workloads we
> want
> > to run in Q1/Q2/Q3... And then find out if the workloads hit the SOE
> > problem we are discussing. My guess is that if useful workloads we want
> to
> > run actually hit SOE, we will be able to workaround it by simply making
> the
> > stack a little bigger. Also my guess is that Java compatibility tests
> > (tck?) will specifically test this case. In other words, its probably
> > needed for compliance but not really needed for getting important
> workloads
> > running.
>
> that has some relevence to the -Xss option. If we implement it, almost
> any "popular workload" would crash in SEGV instead of throwing SOE
> properly when run on a small stack size.
>
> One might argue that running a "popular workload" with a small stack
> size makes the workload "not so popular". I dunno.
I understand your argument. It makes perfect sense. But the question
remains. Is this a bug that has to be fixed in Q2 or in 2008? Is it
acceptable to simply bump up the stack size to get Q2 workloads running?
> > 2)
> > > > When running useful workload, a Stack Overflow that hits precisely
> on a
> > > fast
> > > > native has a very low probability. Note the test in H3010
> specifically
> > > > forces this event to happen with a very high probability. In other
> > > words,
> > > > while the test is a good, it reflects a very rare event in nature.
> > > >
> > > > Given the above, how about we address fixing the problem in two
> stages:
> > > >
> > > > 1)
> > > > First stage: add an "assert(zero);" to the exception handler when it
> is
> > > > determined an SOE has happened inside a fast native. This way, we
> will
> > > find
> > > > out quickly when an important workload hits this bug. Once the
> > > assert(zero)
> > > > is added, we code H3010 as "later"
> > > >
> > > > 2)
> > > > Second stage: When an application we care about hits the
> assert(zero),
> > > we
> > > > recode H3010 as "major/blocker".
> > > >
> > > > 3)
> > > > While waiting for #2 above to happen, we discuss on harmony-dev ways
> of
> > > > designing the right fix. For starts, I think we should investigate
> a
> > > > design where the exception handler rewrites the entire register
> context
> > > so
> > > > that returning from exception handler revectors the instruction
> pointer
> > > to
> > > > recovery code that will somehow push the M2N frame on the stack and
> call
> > > > proper SOE throwing code. I have not looked closely at how to do
> > > this. I
> > > > am not convinced this approach will work. However, I do think its
> worth
> > > a
> > > > try. Thoughts?
> > >
> > > --
> > > Egor Pasko
> > >
> > >
> >
> >
> > --
> > Weldon Washburn
> > Intel Enterprise Solutions Software Division
>
> --
> Egor Pasko
>
>
--
Weldon Washburn
Intel Enterprise Solutions Software Division
|