harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Geir Magnusson Jr." <g...@pobox.com>
Subject Re: [drlvm] stress.Mix / MegaSpawn threading bug
Date Thu, 11 Jan 2007 01:13:17 GMT

On Jan 10, 2007, at 2:13 PM, Weldon Washburn wrote:

> Some observations and a simple patch that might just work well  
> enough for
> the current state of DRLVM development.
> 1)
> In some earlier posting, it was mentioned that somehow the virtual  
> memory
> address space is impacted by how much physical memory is in a given
> computer.  Actually this is not true.  The virtual address space  
> available
> to the JVM is fixed by the OS.  A machine with less phys mem will  
> do more
> disk I/O.   In other words "C" malloc() hard limits are set by OS  
> version
> number not by RAM chips.

Talking about VM vs RAM vs whatever is a red herring - we may be  
ported to a machine w/o virtual memory.  What matters is that when  
malloc() returns null, we do something smart.  At least, do nothing  

> 2)
> Why not simply hard code DRLVM to throw an OOME whenever there are  
> more than
> 1K threads running?  I think Rana first suggested this approach.   
> My guess
> is that 1K threads is good enough to run lots of interesting  
> workloads.  My
> guess is that common versions of WinXP and Linux will handle the C  
> malloc()
> load of 1K threads successfully.  If not, how about trying 512  
> threads?

Because this is picking up the rug, and sweeping all the dirt  
underneath it.  The core problem isn't that we try too many threads,  
but the code wasn't written defensively.  Putting an artificial limit  
on # of threads just means that we'll hit it somewhere else, in some  
other resource usage.

I think we should fix it.

There seem to be some basic things we can do, like reduce the stack  
size on windows from the terabyte or whatever it is now, to the  
number that our dear, esteemed colleague from IBM claims is perfectly  
suitable for production use.

That too doesn't solve the problem, but it certainly fixes a problem  
we are now aware of - our stack size is too big.... :)

> 3)
> The above does not deal with the general architecture question of  
> handling C
> malloc failures.  This is far harder to solve.  Note that solving  
> the big
> question will also require far more extensive regression tests than
> MegaSpawn.  However, it does fix DRLVM so that it does not crash/ 
> burn on
> threads overload.  This, in turn, gives us time to fix the real  
> underlying
> problem(s) with C malloc.

 From this perspective, I don't mind as much, as long as we fix the  
stack sizes.  But a part of me wants to say no, because there's  
nothing compelling us to actually fix the problem. :)

Maybe we set the number to something crippling, like 10 or something,  
which will then motivate anyone who wants to o something useful with  
the VM

I'm just always nervous about things like this...


> On 1/10/07, Geir Magnusson Jr. <geir@pobox.com> wrote:
>> On Jan 10, 2007, at 8:51 AM, Gregory Shimansky wrote:
>> > Geir Magnusson Jr. wrote:
>> >> The big thing for me is ensuring that we can drive the VM to the
>> >> limit, and it maintains internal integrity, so applications that
>> >> are designed to gracefully deal with resource exhaustion can do so
>> >> w/ confidence that the VM isn't about to crume out from
>> >> underneath them.
>> >
>> > I agree with Geir that we should try to handle out of C heap
>> > condition gracefully. The problem is that there is no clearly
>> > defined contract for many functions that use memory allocation
>> > about what to do in case of out of memory condition.
>> >
>> > To maintain integrity all VM functions which allocate memory from C
>> > heap should return gracefully all the way up the stack until they
>> > hit Java code that called them and then OOME exception shall be
>> > seen by the Java code. It is not an easy task because all code
>> > paths should support it, including JIT and GC.
>> >
>> Agreed.  But certainly worth striving for :)
>> geir
> -- 
> Weldon Washburn
> Intel Enterprise Solutions Software Division

View raw message