harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Weldon Washburn" <weldon...@gmail.com>
Subject Re: [drlvm][threading] H3010 (Stack Overflow Exception) -- when does this bug really have to be fixed?
Date Mon, 12 Mar 2007 20:35:41 GMT
On 12 Mar 2007 21:52:45 +0300, Egor Pasko <egor.pasko@gmail.com> wrote:
>
> On the 0x297 day of Apache Harmony Weldon Washburn wrote:
> > On 12 Mar 2007 19:46:06 +0300, Egor Pasko <egor.pasko@gmail.com> wrote:
> > >
> > > On the 0x297 day of Apache Harmony Weldon Washburn wrote:
> > > > All,
> > > > I assigned H3010 to myself.  This test definitely demonstrates a bug
> > > that
> > > > needs fixing.  But its not clear when this bug must be fixed.  This
> > > really
> > > > brings forward a higher-level.  What to code this bug right now and
> when
> > > > would this bug be moved to "blocker" status?  I provide some
> > > observations to
> > > > start the discussion:
> > > >
> > > > 1)
> > > > The bug is a Stack Overflow Exception happens from inside fast
> native
> > > helper
> > > > functions.  Fast native helpers do not setup the M2N stack frame
> which
> > > is
> > > > required to throw exceptions such as SOE.  Adding M2N setup to fast
> > > native
> > > > helper will unacceptably slow down the system.
> > >
> > > to be honest..
> > >
> > > SOE can happen from a 'push' onto stack (such pushes are not
> > > safepoints in JIT currently). Thus, you cannot unwind properly (no M2N
> > > necessary for releasing the lock).
> > >
> > > Do you think it is a low probability?
> >
> >
> > Good point.  Yes, SOE can happen from jitted code doing stuff like "push
> > ebp".  And we have to handle this case properly.  And it will require a
> > design discussion between JIT and VM developers.  This is really
> interesting
> > topic.  But the question remains.  Do we have to solve this issue in Q1?
> > Q4?  2008??  To answer this question, we have to ask what workloads we
> want
> > to run in Q1/Q2/Q3...  And then find out if the workloads hit the SOE
> > problem we are discussing.  My guess is that if useful workloads we want
> to
> > run actually hit SOE, we will be able to workaround it by simply making
> the
> > stack a little bigger.  Also my guess is that Java compatibility tests
> > (tck?) will specifically test this case.  In other words, its probably
> > needed for compliance but not really needed for getting important
> workloads
> > running.
>
> that has some relevence to the -Xss option. If we implement it, almost
> any "popular workload" would crash in SEGV instead of throwing SOE
> properly when run on a small stack size.
>
> One might argue that running a "popular workload" with a small stack
> size makes the workload "not so popular". I dunno.


I understand your argument.  It makes perfect sense.  But the question
remains.  Is this a bug that has to be fixed in Q2 or in 2008?  Is it
acceptable to simply bump up the stack size to get Q2 workloads running?

> > 2)
> > > > When running useful workload, a Stack Overflow that hits precisely
> on a
> > > fast
> > > > native has a very low probability.  Note the test in H3010
> specifically
> > > > forces this event to happen with a very high probability.  In other
> > > words,
> > > > while the test is a good, it reflects a very rare event in nature.
> > > >
> > > > Given the above, how about we address fixing the problem in two
> stages:
> > > >
> > > > 1)
> > > > First stage: add an "assert(zero);" to the exception handler when it
> is
> > > > determined an SOE has happened inside a fast native.  This way, we
> will
> > > find
> > > > out quickly when an important workload hits this bug.  Once the
> > > assert(zero)
> > > > is added, we code H3010 as "later"
> > > >
> > > > 2)
> > > > Second stage: When an application we care about hits the
> assert(zero),
> > > we
> > > > recode H3010 as "major/blocker".
> > > >
> > > > 3)
> > > > While waiting for #2 above to happen, we discuss on harmony-dev ways
> of
> > > > designing the right fix.  For starts,  I think we should investigate
> a
> > > > design where the exception handler rewrites the entire register
> context
> > > so
> > > > that returning from exception handler revectors the instruction
> pointer
> > > to
> > > > recovery code that will somehow push the M2N frame on the stack and
> call
> > > > proper SOE throwing code.  I have not looked closely at how to do
> > > this.  I
> > > > am not convinced this approach will work.  However, I do think its
> worth
> > > a
> > > > try.  Thoughts?
> > >
> > > --
> > > Egor Pasko
> > >
> > >
> >
> >
> > --
> > Weldon Washburn
> > Intel Enterprise Solutions Software Division
>
> --
> Egor Pasko
>
>


-- 
Weldon Washburn
Intel Enterprise Solutions Software Division

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message