harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Geir Magnusson Jr." <g...@pobox.com>
Subject Re: [drlvm] stress.Mix / MegaSpawn threading bug
Date Wed, 10 Jan 2007 13:07:53 GMT

On Jan 9, 2007, at 10:51 AM, Weldon Washburn wrote:

> On 1/9/07, Gregory Shimansky <gshimansky@gmail.com> wrote:
>>
>> Geir Magnusson Jr. wrote:
>> > I started a new thread because I think this is really important.
>> >
>> > I've also added a page in the wiki to track this stuff, because  
>> I can't
>> > keep it in my head:
>> >
>> >   http://wiki.apache.org/harmony/MegaSpawnThreadingBug
>> >
>> > which you can get to from the home page via the "WhiteBoards"  
>> section,
>> > intended to be a place where we can work as a team on a  
>> whiteboard, with
>> > the intention that once the mini-project is over, we erase...
>
>
> This is a good idea.  I still want to put some of the discussion on  
> email so
> that we have a permanent record of our investigations.  I have some  
> thoughts
> inlined below.
>
>>
>> > I think this is a scary scary problem :)
>>
>> I've tried to analyze MegaSpawn test on windows and here's what I  
>> found
>> out.
>>
>> OOME is thrown because process virtual size easily gets up to 2Gb.  
>> This
>> happens at about ~1.5k simultaneously running threads. I think it
>> happens because all of virtual process memory is mapped for thread  
>> stacks.
>>
>> When virtual memory is exhausted all kind of problems may occur.  
>> In many
>> places there are assertions that malloc returns non-NULL, and these
>> assertions fail. In some places there are no checks for malloc,  
>> and NULL
>> pointer is used for addressing, this also crashes VM.
>
>
> Good job!  I got the same sort of hunch when I looked at the source  
> code did
> not have enough time to pin down specifics.  The only guidance I
> found regarding what happens when too many threads are spawned is the
> following in the java.lang.Thread reference manual, "...specifying  
> a lower
> [stacksize] value may allow a greater number of threads to exist
> concurrently without throwing an OutOfMemoryError (or other internal
> error)."
>
> I think what the above implies is that it is OK for the JVM to  
> error and
> exit if the app tries to create too many threads.  If this is the  
> case, it
> sort of looks like we need to clean up the handling of malloc()  
> errors so
> that the JVM can exit gracefully.

Well - I think that we should strive to maintain an internally  
consistent VM, throw an OOM, and let the app decide.  There are  
situations where with a solid VM, you can deal w/ the OOM at the app  
level.

>
> Another approach would be to throw something like a,
> "TooManyThreadsAtOnceException" and keep running the app.  I can't  
> find
> anything like this kind of exception.  Its probably not an option.

No :)

>
> Another approach would be to make Thread.start() method wait until  
> there are
> enough resources to create a new thread.  Most likely the app would  
> hang
> mysteriously without warning.  This is probably not an option either.

Nope :)

>
> Another item we need to discuss is what are the Q1/Q2 goals for max  
> number
> of threads supported?  It seems we can do lots of useful stuff with  
> a max of
> 1500 threads.  The useful stuff being items like the bringup of  
> enterprise
> apps, fixing stability problems...

I don't mind a suboptimal # of concurrent threads - we can work on  
that over time.  The fact that the VM falls over dead scares the  
bejeezus out of me.

geir

>
>
> I tried to watch Sun implementation and it looks like they map smaller
>> amounts of memory for thread stacks. Maybe they map only initial  
>> stack
>> memory somehow and allow it to grow later (although I don't quite
>> understand how it is possible in continuous address space). When  
>> Sun VM
>> executes this test it created up to ~6k simultaneously running  
>> threads
>> and process size at the same moment was smaller than 2Gb.
>>
>> I think the same problem may happen on Linux because it spills out  
>> OOMEs
>> on Ubuntu as well.
>>
>> If somehow test doesn't crash on failed mallocs and gets to the  
>> shutdown
>> stage and hangs with 2 or more dead locked threads. So far I didn't
>> quite understand how they lock each other.
>>
>> --
>> Gregory
>>
>>
>
>
> -- 
> Weldon Washburn
> Intel Enterprise Solutions Software Division


Mime
View raw message