harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Weldon Washburn" <weldon...@gmail.com>
Subject Re: [drlvm][threading] H3010 (Stack Overflow Exception) -- when does this bug really have to be fixed?
Date Mon, 12 Mar 2007 18:03:58 GMT
On 12 Mar 2007 19:46:06 +0300, Egor Pasko <egor.pasko@gmail.com> wrote:
>
> On the 0x297 day of Apache Harmony Weldon Washburn wrote:
> > All,
> > I assigned H3010 to myself.  This test definitely demonstrates a bug
> that
> > needs fixing.  But its not clear when this bug must be fixed.  This
> really
> > brings forward a higher-level.  What to code this bug right now and when
> > would this bug be moved to "blocker" status?  I provide some
> observations to
> > start the discussion:
> >
> > 1)
> > The bug is a Stack Overflow Exception happens from inside fast native
> helper
> > functions.  Fast native helpers do not setup the M2N stack frame which
> is
> > required to throw exceptions such as SOE.  Adding M2N setup to fast
> native
> > helper will unacceptably slow down the system.
>
> to be honest..
>
> SOE can happen from a 'push' onto stack (such pushes are not
> safepoints in JIT currently). Thus, you cannot unwind properly (no M2N
> necessary for releasing the lock).
>
> Do you think it is a low probability?


Good point.  Yes, SOE can happen from jitted code doing stuff like "push
ebp".  And we have to handle this case properly.  And it will require a
design discussion between JIT and VM developers.  This is really interesting
topic.  But the question remains.  Do we have to solve this issue in Q1?
Q4?  2008??  To answer this question, we have to ask what workloads we want
to run in Q1/Q2/Q3...  And then find out if the workloads hit the SOE
problem we are discussing.  My guess is that if useful workloads we want to
run actually hit SOE, we will be able to workaround it by simply making the
stack a little bigger.  Also my guess is that Java compatibility tests
(tck?) will specifically test this case.  In other words, its probably
needed for compliance but not really needed for getting important workloads
running.

> 2)
> > When running useful workload, a Stack Overflow that hits precisely on a
> fast
> > native has a very low probability.  Note the test in H3010 specifically
> > forces this event to happen with a very high probability.  In other
> words,
> > while the test is a good, it reflects a very rare event in nature.
> >
> > Given the above, how about we address fixing the problem in two stages:
> >
> > 1)
> > First stage: add an "assert(zero);" to the exception handler when it is
> > determined an SOE has happened inside a fast native.  This way, we will
> find
> > out quickly when an important workload hits this bug.  Once the
> assert(zero)
> > is added, we code H3010 as "later"
> >
> > 2)
> > Second stage: When an application we care about hits the assert(zero),
> we
> > recode H3010 as "major/blocker".
> >
> > 3)
> > While waiting for #2 above to happen, we discuss on harmony-dev ways of
> > designing the right fix.  For starts,  I think we should investigate a
> > design where the exception handler rewrites the entire register context
> so
> > that returning from exception handler revectors the instruction pointer
> to
> > recovery code that will somehow push the M2N frame on the stack and call
> > proper SOE throwing code.  I have not looked closely at how to do
> this.  I
> > am not convinced this approach will work.  However, I do think its worth
> a
> > try.  Thoughts?
>
> --
> Egor Pasko
>
>


-- 
Weldon Washburn
Intel Enterprise Solutions Software Division

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message