hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Gates <ga...@yahoo-inc.com>
Subject Re: A proposal for changing pig's memory management
Date Tue, 19 May 2009 22:44:38 GMT
The claims in the paper I was interested in were not issues like non- 
blocking I/O etc.  The claim that is of interest to pig is that a  
memory allocation and garbage collection scheme that is beyond the  
control of the programmer is a bad fit for a large data processing  
system.  This is a fundamental design choice in Java, and fits it well  
for the vast majority of its uses.  But for systems like Pig there  
seems to be no choice but to work around Java's memory management.   
I'll clarify this point in the document.

I took a closer look at NIO.  My concern is that it does not give the  
level of control I want.  NIO allows you to force a buffer to disk and  
request a buffer to load, but you cannot force a page out of memory.   
It doesn't even guarantee that after you load a page it will really be  
loaded.  One of the biggest issues in pig right now is that we run out  
memory or get the garbage collector in a situation where it can't make  
sufficient progress.  Perhaps switching to large buffers instead of  
having many individual objects will address this.  But I'm concerned  
that if we cannot explicitly force data out of memory onto disk then  
we'll be back in the same boat of trusting the Java memory manager.

Alan.

On May 14, 2009, at 7:43 PM, Ted Dunning wrote:

> That Telegraph dataflow paper is pretty long in the tooth.  Certainly
> several of their claims have little force any more (lack of non- 
> blocking
> I/O, poor thread performance, no unmap, very expensive  
> synchronization for
> uncontested locks).  It is worth that they did all of their tests on  
> the 1.3
> JVM and things have come an enormous way since then.
>
> Certainly, it is worth having opaque contains based on byte arrays,  
> but
> isn't that pretty much what the NIO byte buffers are there to provide?
> Wouldn't a virtual tuple type that was nothing more than a byte  
> buffer, type
> and an offset do almost all of what is proposed here?
>
> On Thu, May 14, 2009 at 5:33 PM, Alan Gates <gates@yahoo-inc.com>  
> wrote:
>
>> http://wiki.apache.org/pig/PigMemory
>>
>> Alan.
>>


Mime
View raw message