harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gregory Shimansky <gshiman...@gmail.com>
Subject Re: [drlvm] stress.Mix / MegaSpawn threading bug
Date Thu, 11 Jan 2007 17:00:12 GMT
Geir Magnusson Jr. wrote:
> On Jan 10, 2007, at 9:00 AM, Gregory Shimansky wrote:
>> Geir Magnusson Jr. wrote:
>>>> I think the same problem may happen on Linux because it spills out 
>>>> OOMEs on Ubuntu as well.
>>>> If somehow test doesn't crash on failed mallocs and gets to the 
>>>> shutdown stage and hangs with 2 or more dead locked threads. So far 
>>>> I didn't quite understand how they lock each other.
>>> Cool - thanks.  If you have a free second, could you note this on the 
>>> wiki page so we don't forget?
>> I think it is better to track this with JIRA. AFAIU is not a stress 
>> conditions issue and so it is a normal bug which should be found and 
>> fixed. I created a new JIRA HARMONY-2963 which is subtask for 
>> HARMONY-2803 where Weldon attached his MegaSpawn test.
> Agreed that a JIRA is important - I just wanted to make sure that we 
> added it somehow to the whiteboard so we had a complete picture of 
> things related to this problem.

Today investigation of the hanging threads at shutdown have 2 different 
reasons. 1st one was found by Salikh and he wrote his comments in 
HARMONY-2963. The bug happened because the counter of non-daemon threads 
increased before a thread was created. If a thread failed to be created 
because of no memory, this counter was not updated.

Another reason for hanging threads is that they wait in Thread.start(). 
When a new thread is started, it has to notify a lock object, in order 
to signal the parent thread that it has been created. This notification 
is sent from java code of the Thread before user code is executed.

But thread manager has some native code too which is ran before java 
code of the newly started thread. This native code tried to set up some 
thread state like new JNI environment and other stuff, and this requires 
allocation of new memory. If allocation of new memory fails, this native 
code of the newly created thread tries to return an error which is not 
seen anywhere (since this is the code which is the first function of the 
new thread), so it is not noticed. But since native code of the new 
thread finishes silently, it never runs the Java code which should do 
monitor notification, so monitor is not notified. So the parent thread 
just waits infinitely.

To fix this bug I think it is necessary to get rid of error conditions 
in the newly created threads. I think it is necessary to allocate all 
necessary state before a new thread is started, so if these resources 
cannot be allocated, an error should be returned to the parent thread, 
and it won't wait infinitely on new thread start notification.


View raw message