harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Evgueni Brevnov" <evgueni.brev...@gmail.com>
Subject Re: [drlvm][threading] Potential race condition in safe-point callbacks.
Date Tue, 12 Dec 2006 14:41:13 GMT
The following JIRA was created to track the problem
http://issues.apache.org/jira/browse/HARMONY-2648

Thanks
Evgueni

On 12/12/06, Evgueni Brevnov <evgueni.brevnov@gmail.com> wrote:
> Nikolay,
>
> I just tried to separate shutdown related from safe-point callback
> stuff. No problems were identified so far. So I will file a separate
> JIRA for callback related things.
>
> On 12/12/06, Nikolay Kuznetsov <nikolay.kuznetsov@gmail.com> wrote:
> > Evgueni,
> >
> > sorry for intrusion, at the very last moment, in general I agree with
> > all mentioned problems and appreciate your work done for shutdown and
> > suspension enhancements but if you don't mind I have I have few
> > questions or suggestions:
> >
> > 1) Am I right, that all these scenarios came from shutdown sequence,
> > I'm asking because, I would discuss some points but by no means I
> > don't want to stop your shutdown patch which is good in general, but
> > if those changes can go to the code base separately it would be
> > preferable.
> >
> > 2) May I ask you to separate safepoint_callback related fixes into
> > separate JIRA; There were several suggestions which came from Weldon
> > to improve safepoint callback mechanism which we can incorporate into
> > single patch and also it would help to review those changes faster and
> > easier (if it's not so hard, of 'cause).
> >
> > 3) Third problem in your list is a tricky one, it leads to separating
> > suspend into user requested and system wide, because first one should
> > be interrupted by stop (according to RI) and second one seems to be
> > stop proof.
>
> Yes, I was thinking about that as well. It would be nice if we can
> find another solution which is easier than differentiation between
> "internal" and "external" requests.
>
> > Personally I prefer to exclude this test until the problem
> > is resolved and check your changes in for the sake of safety.
>
> Me too.
>
> > By the
> > way, does shutdown sequence workaround suspended threads correctly?
>
> No. In case of shutdown we can't even reset suspend_request counter in
> the callback function because it is external component to TM.
>
> > And I would also ask Ivan Popov if such a scenario may be reserved for
> > JDWP?
> >
> > Thank you.
> >   Nik.
> >
>
> Thanks
> Evgueni
> > On 12/11/06, Evgueni Brevnov <evgueni.brevnov@gmail.com> wrote:
> > > Hi All,
> > >
> > > While working on http://issues.apache.org/jira/browse/HARMONY-2391 I
> > > found out that current implementation of safe-point callbacks works
> > > incorrectly/unsafely. Here is a list of problems:
> > >
> > > 1) hythread_suspend & hythread_resume are used asynchronously what may
> > > lead to a deadlock. Here is the scenario:
> > >     a) T1 wants to set a safe-point callback to T2. So it calls
> > > hythread_set_safe_point_callback(T2, the_callback).
> > >     b) T1 sets T2->safepoint_callback to the_callback;
> > >     c) T2 executes hythread_resume(self) which is under if
> > > (thread->safepoint_callback) statement.
> > >     d) T1 calls send_suspend_request(T2). So T2 will never be resumed
> > > because it already invoked corresponding hythread_resume(self)
> > > statement on the previous step.
> > >
> > > 2) Current implementation sets suspend_disable_count to 1
> > > (thread_native_suspend.c:162) and allows execution of unsafe code
> > > (which must be executed under suspend disabled state) while GC may be
> > > working.....
> > >
> > > 3) In stop_callback (thread_java_basic.c:413) suspend_request for the
> > > current thread is set to zero. So the thread just ignores
> > > suspend_requests and continue to do its dirty things.
> > >
> > > All these problems are fixed in the HARMONY-2391. But
> > > org.apache.harmony.luni.tests.java.lang.ThreadGroupTest.test_suspend
> > > starts to fail. The scenario is as follows:
> > >
> > > 1) T1 suspends T2 (so the suspend_request is >0).
> > > 2) T1 stops T2 by setting up the stop_callback to T2.
> > > 3) T2 needs to switch to suspend disabled state to be able to throw
> > > exception but it can't do that because of step 1)
> > >
> > > From the one hand, If I rollback my changes for the 3) problem
> > > described in the first list the this test works fine. From the other
> > > hand, I can't do it because it may lead to a crash if GC is really
> > > working at the time I want to throw an exception. I don't see how to
> > > fix it in an easy way right now.... but want to proceed with
> > > HARMONY-2391.
> > >
> > > What would you recommend to do?
> > > 1) Commit the HARMONY-2391 patch as is. File a JIRA regarding failing
> > > case. Exclude the test until the bug is fixed.
> > > 2) Commit the HARMONY-2391 patch with 3) undone. File a JIRA. In this
> > > case it is possible to have intermittent failures until the problem is
> > > not fixed.
> > >
> > > Thanks
> > > Evgueni
> > >
> >
>

Mime
View raw message