hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mridul Muralidharan <mrid...@yahoo-inc.com>
Subject Re: A proposal for changing pig's memory management
Date Wed, 20 May 2009 05:30:33 GMT

I am still not very convinced about the value about this implementation 
- particularly considering the advances made since 1.3 in memory 
allocators and garbage collection.

The side effect of this proposal is many, and sometimes non-obvious.
Like implicitly moving young generation data into older generation, 
causing much more memory pressure for gc, fragmentation of memory blocks 
causing quite a bit of memory pressure, replicating quite a bit of 
functionality with garbage collection, possibility of bugs with ref 
counting, etc.

If assumption that current working set of bag/tuple does not need to be 
spilled, and anything else can be, then this will pretty much 
deteriorate to current impl in worst case.




A much more simpler method to gain benefits would be to handle 
primitives as ... primitives and not through the java wrapper classes 
for them.
It should be possible to write schema aware tuples which make use of the 
primitives specified to take a fraction of memory required (4 bytes + 
null_check boolean for int + offset mapping instead of 24/32 bytes it 
currently is, etc).



Regards,
Mridul

Alan Gates wrote:
> http://wiki.apache.org/pig/PigMemory
> 
> Alan.


Mime
View raw message