harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Egor Pasko <egor.pa...@gmail.com>
Subject Re: [drlvm][threading] H3010 (Stack Overflow Exception) -- when does this bug really have to be fixed?
Date Tue, 13 Mar 2007 13:16:25 GMT
On the 0x298 day of Apache Harmony Gregory Shimansky wrote:
> Egor Pasko wrote:
> > On the 0x297 day of Apache Harmony Weldon Washburn wrote:
> >> All,
> >> I assigned H3010 to myself.  This test definitely demonstrates a bug that
> >> needs fixing.  But its not clear when this bug must be fixed.  This really
> >> brings forward a higher-level.  What to code this bug right now and when
> >> would this bug be moved to "blocker" status?  I provide some observations to
> >> start the discussion:
> >>
> >> 1)
> >> The bug is a Stack Overflow Exception happens from inside fast native helper
> >> functions.  Fast native helpers do not setup the M2N stack frame which is
> >> required to throw exceptions such as SOE.  Adding M2N setup to fast native
> >> helper will unacceptably slow down the system.
> > to be honest..
> > SOE can happen from a 'push' onto stack (such pushes are not
> > safepoints in JIT currently). Thus, you cannot unwind properly (no M2N
> > necessary for releasing the lock).
> > Do you think it is a low probability?
> 
> If SOE happens in managed code it is handled similar to hardware NPEs,
> that is the stack is unwound to the exception handler if it exists or
> to the nearest native frame. AFAIK hardware NPE places are not
> safepoints too, so there is a small possibility for enumeration bugs
> in such places.

yes, you are right. Still hardware NPEs are disabled for
a) try..catch regions 
b) synchronized methods (VM needs jit_get_address_of_this)

Okay, I should agree that both (a) and (b) are not likely to hit SOE
on "popular workloads". And, AFAIR, we have both bugs in JIRA, so, all
is going fine. Let's think of them as having low probability.

> >> 2)
> >> When running useful workload, a Stack Overflow that hits precisely on a fast
> >> native has a very low probability.  Note the test in H3010 specifically
> >> forces this event to happen with a very high probability.  In other words,
> >> while the test is a good, it reflects a very rare event in nature.
> >>
> >> Given the above, how about we address fixing the problem in two stages:
> >>
> >> 1)
> >> First stage: add an "assert(zero);" to the exception handler when it is
> >> determined an SOE has happened inside a fast native.  This way, we will find
> >> out quickly when an important workload hits this bug.  Once the assert(zero)
> >> is added, we code H3010 as "later"
> >>
> >> 2)
> >> Second stage: When an application we care about hits the assert(zero), we
> >> recode H3010 as "major/blocker".
> >>
> >> 3)
> >> While waiting for #2 above to happen, we discuss on harmony-dev ways of
> >> designing the right fix.  For starts,  I think we should investigate a
> >> design where the exception handler rewrites the entire register context so
> >> that returning from exception handler revectors the instruction pointer to
> >> recovery code that will somehow push the M2N frame on the stack and call
> >> proper SOE throwing code.  I have not looked closely at how to do this.  I
> >> am not convinced this approach will work.  However, I do think its worth a
> >> try.  Thoughts?
> >
> 
> 
> -- 
> Gregory
> 
> 

-- 
Egor Pasko


Mime
View raw message