harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Evgueni Brevnov" <evgueni.brev...@gmail.com>
Subject Re: [drlvm][threading] Potential race condition in safe-point callbacks.
Date Tue, 12 Dec 2006 08:14:24 GMT
Nikolay,

I just tried to separate shutdown related from safe-point callback
stuff. No problems were identified so far. So I will file a separate
JIRA for callback related things.

On 12/12/06, Nikolay Kuznetsov <nikolay.kuznetsov@gmail.com> wrote:
> Evgueni,
>
> sorry for intrusion, at the very last moment, in general I agree with
> all mentioned problems and appreciate your work done for shutdown and
> suspension enhancements but if you don't mind I have I have few
> questions or suggestions:
>
> 1) Am I right, that all these scenarios came from shutdown sequence,
> I'm asking because, I would discuss some points but by no means I
> don't want to stop your shutdown patch which is good in general, but
> if those changes can go to the code base separately it would be
> preferable.
>
> 2) May I ask you to separate safepoint_callback related fixes into
> separate JIRA; There were several suggestions which came from Weldon
> to improve safepoint callback mechanism which we can incorporate into
> single patch and also it would help to review those changes faster and
> easier (if it's not so hard, of 'cause).
>
> 3) Third problem in your list is a tricky one, it leads to separating
> suspend into user requested and system wide, because first one should
> be interrupted by stop (according to RI) and second one seems to be
> stop proof.

Yes, I was thinking about that as well. It would be nice if we can
find another solution which is easier than differentiation between
"internal" and "external" requests.

> Personally I prefer to exclude this test until the problem
> is resolved and check your changes in for the sake of safety.

Me too.

> By the
> way, does shutdown sequence workaround suspended threads correctly?

No. In case of shutdown we can't even reset suspend_request counter in
the callback function because it is external component to TM.

> And I would also ask Ivan Popov if such a scenario may be reserved for
> JDWP?
>
> Thank you.
>   Nik.
>

Thanks
Evgueni
> On 12/11/06, Evgueni Brevnov <evgueni.brevnov@gmail.com> wrote:
> > Hi All,
> >
> > While working on http://issues.apache.org/jira/browse/HARMONY-2391 I
> > found out that current implementation of safe-point callbacks works
> > incorrectly/unsafely. Here is a list of problems:
> >
> > 1) hythread_suspend & hythread_resume are used asynchronously what may
> > lead to a deadlock. Here is the scenario:
> >     a) T1 wants to set a safe-point callback to T2. So it calls
> > hythread_set_safe_point_callback(T2, the_callback).
> >     b) T1 sets T2->safepoint_callback to the_callback;
> >     c) T2 executes hythread_resume(self) which is under if
> > (thread->safepoint_callback) statement.
> >     d) T1 calls send_suspend_request(T2). So T2 will never be resumed
> > because it already invoked corresponding hythread_resume(self)
> > statement on the previous step.
> >
> > 2) Current implementation sets suspend_disable_count to 1
> > (thread_native_suspend.c:162) and allows execution of unsafe code
> > (which must be executed under suspend disabled state) while GC may be
> > working.....
> >
> > 3) In stop_callback (thread_java_basic.c:413) suspend_request for the
> > current thread is set to zero. So the thread just ignores
> > suspend_requests and continue to do its dirty things.
> >
> > All these problems are fixed in the HARMONY-2391. But
> > org.apache.harmony.luni.tests.java.lang.ThreadGroupTest.test_suspend
> > starts to fail. The scenario is as follows:
> >
> > 1) T1 suspends T2 (so the suspend_request is >0).
> > 2) T1 stops T2 by setting up the stop_callback to T2.
> > 3) T2 needs to switch to suspend disabled state to be able to throw
> > exception but it can't do that because of step 1)
> >
> > From the one hand, If I rollback my changes for the 3) problem
> > described in the first list the this test works fine. From the other
> > hand, I can't do it because it may lead to a crash if GC is really
> > working at the time I want to throw an exception. I don't see how to
> > fix it in an easy way right now.... but want to proceed with
> > HARMONY-2391.
> >
> > What would you recommend to do?
> > 1) Commit the HARMONY-2391 patch as is. File a JIRA regarding failing
> > case. Exclude the test until the bug is fixed.
> > 2) Commit the HARMONY-2391 patch with 3) undone. File a JIRA. In this
> > case it is possible to have intermittent failures until the problem is
> > not fixed.
> >
> > Thanks
> > Evgueni
> >
>

Mime
View raw message