harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gregory Shimansky <gshiman...@apache.org>
Subject Re: [drlvm] run of smoke tests on overloaded box
Date Sun, 17 Jun 2007 20:38:30 GMT
Rana Dasgupta wrote:
> I could repro a couple of these cases finally( on Linux x64 ) and I
> think that this problem is happening because of the known weakness in
> shutdown of daemon threads.  gdb shows a SIGSEGV in the cancel
> handler, usually reporting a zombie thread.
> 
> In the current shutdown we register safepoint shutdown callbacks and
> do timed joins, waiting for the daemon threads to exit. We make some
> reasonable guess on the join timeout interval. After this, we kill the
> threads ( on Linux with a pthread_cancel ). When the cycle eater runs
> in the background, the join interval we have chosen is not enough. But
> sometimes, between the time we give up on the joins and before we post
> the cancel signals, the thread( default attribute is joinable and not
> detached thread ) finally completes the safepoint shutdown callback
> and exits. It is now a zombie or whatever, and would release all

Could it be that some additional synchronization of thread state would 
help here? I mean there should be some safe way of termination threads 
that might be finishing at the same time.

Or it could possibly also be a bug in pthread_cancel implementation in 
glibc version on the system that you are using. Which Linux version do 
you have?

> resources on join. But in shutdown we have given up on join and has
> started pthread_cancel(). The CANCEL signal fails to handle on the
> zombie thread and raises SIGSEGV. I don't know Linux well enough to
> know the exact dynamics of zombies.
> 
> I multiplied the join timeout interval by a factor of 100 and the
> errors went away, with cycle eater running in the background. I don't
> think we want to make changes like this in the VM. This is not a good
> way to tune wall clock times ( some of which need to exist in the
> implementation ).
> 
> I also have some concern about how we are choosing to create these
> test scenarios. Artificial severe stress conditions can be simulated
> in tests creating failures that are time consuming to debug. But I
> don't know how much extra information they give us. For example, we
> already known that daemon thread shutdown is not perfect. If we choose
> to create stresses, I think that it is better to use real applications
> or well known workloads. In that case, failures would be more
> meaningful and would give us some good guidance on tuning things.
> 
> On6/6/07, Vladimir Ivanov <ivavladimir@gmail.com> wrote:
>> issue HARMONY-4080 was created to track it.
>>
>>  thanks, Vladimir
>>
>> On 5/18/07, Vladimir Ivanov <ivavladimir@gmail.com> wrote:
>> > The CC/CI report failures just now on linux x86_64 in default mode:
>> > -----------------------------
>> > Running test : thread.ThreadInterrupt
>> > *** FAILED **** : thread.ThreadInterrupt (139 res code)
>> > -----------------------------
>> >
>> >  thanks, Vladimir
>> >
>> >
>> > On 5/18/07, Rana Dasgupta <rdasgupt@gmail.com> wrote:
>> > > OK, I will also try to change this test to make it more meaningful
>> > > than it is now. We can then decide if we want to keep it or lose it?
>> > >
>> > > On 5/17/07, Pavel Rebriy <pavel.rebriy@gmail.com> wrote:
>> > > > May be better modify tests to the correct way?
>> > > > The test gc.ThreadSuspension check suspension model during garbage
>> > > > collection. It is a very useful test for VM.
>> > > > --
>> > > > Best regards,
>> > > > Pavel Rebriy
>> > > >
>> > >
>> >
>>
> 


-- 
Gregory


Mime
View raw message