harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Geir Magnusson Jr." <g...@pobox.com>
Subject Re: [drlvm] stress.Mix / MegaSpawn threading bug
Date Wed, 10 Jan 2007 12:37:52 GMT

On Jan 9, 2007, at 9:02 AM, Gregory Shimansky wrote:

> Geir Magnusson Jr. wrote:
>> I started a new thread because I think this is really important.
>> I've also added a page in the wiki to track this stuff, because I  
>> can't keep it in my head:
>>   http://wiki.apache.org/harmony/MegaSpawnThreadingBug
>> which you can get to from the home page via the "WhiteBoards"  
>> section, intended to be a place where we can work as a team on a  
>> whiteboard, with the intention that once the mini-project is over,  
>> we erase...
>> I think this is a scary scary problem :)
> I've tried to analyze MegaSpawn test on windows and here's what I  
> found out.
> OOME is thrown because process virtual size easily gets up to 2Gb.  
> This happens at about ~1.5k simultaneously running threads. I think  
> it happens because all of virtual process memory is mapped for  
> thread stacks.
> When virtual memory is exhausted all kind of problems may occur. In  
> many places there are assertions that malloc returns non-NULL, and  
> these assertions fail. In some places there are no checks for  
> malloc, and NULL pointer is used for addressing, this also crashes VM.

This is actually good news (I thinK), as I'd rather be running out of  
heap rather than trashing it.

This is also useful for hardening - we should spend some time finding  
places where we aren't checking mallocs and such..

> I tried to watch Sun implementation and it looks like they map  
> smaller amounts of memory for thread stacks. Maybe they map only  
> initial stack memory somehow and allow it to grow later (although I  
> don't quite understand how it is possible in continuous address  
> space). When Sun VM executes this test it created up to ~6k  
> simultaneously running threads and process size at the same moment  
> was smaller than 2Gb.
> I think the same problem may happen on Linux because it spills out  
> OOMEs on Ubuntu as well.
> If somehow test doesn't crash on failed mallocs and gets to the  
> shutdown stage and hangs with 2 or more dead locked threads. So far  
> I didn't quite understand how they lock each other.

Cool - thanks.  If you have a free second, could you note this on the  
wiki page so we don't forget?


> -- 
> Gregory

View raw message