harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Geir Magnusson Jr." <g...@pobox.com>
Subject Re: [drlvm] stress.Mix / MegaSpawn threading bug
Date Wed, 10 Jan 2007 12:37:52 GMT

On Jan 9, 2007, at 9:02 AM, Gregory Shimansky wrote:

> Geir Magnusson Jr. wrote:
>> I started a new thread because I think this is really important.
>> I've also added a page in the wiki to track this stuff, because I  
>> can't keep it in my head:
>>   http://wiki.apache.org/harmony/MegaSpawnThreadingBug
>> which you can get to from the home page via the "WhiteBoards"  
>> section, intended to be a place where we can work as a team on a  
>> whiteboard, with the intention that once the mini-project is over,  
>> we erase...
>> I think this is a scary scary problem :)
>
> I've tried to analyze MegaSpawn test on windows and here's what I  
> found out.
>
> OOME is thrown because process virtual size easily gets up to 2Gb.  
> This happens at about ~1.5k simultaneously running threads. I think  
> it happens because all of virtual process memory is mapped for  
> thread stacks.
>
> When virtual memory is exhausted all kind of problems may occur. In  
> many places there are assertions that malloc returns non-NULL, and  
> these assertions fail. In some places there are no checks for  
> malloc, and NULL pointer is used for addressing, this also crashes VM.
>

This is actually good news (I thinK), as I'd rather be running out of  
heap rather than trashing it.

This is also useful for hardening - we should spend some time finding  
places where we aren't checking mallocs and such..


> I tried to watch Sun implementation and it looks like they map  
> smaller amounts of memory for thread stacks. Maybe they map only  
> initial stack memory somehow and allow it to grow later (although I  
> don't quite understand how it is possible in continuous address  
> space). When Sun VM executes this test it created up to ~6k  
> simultaneously running threads and process size at the same moment  
> was smaller than 2Gb.
>
> I think the same problem may happen on Linux because it spills out  
> OOMEs on Ubuntu as well.
>
> If somehow test doesn't crash on failed mallocs and gets to the  
> shutdown stage and hangs with 2 or more dead locked threads. So far  
> I didn't quite understand how they lock each other.

Cool - thanks.  If you have a free second, could you note this on the  
wiki page so we don't forget?

geir

>
> -- 
> Gregory
>


Mime
View raw message