pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Dai <jiany...@yahoo-inc.com>
Subject Re: split operator
Date Mon, 26 Jul 2010 18:09:25 GMT
Hi, Gang,
Which part of the paper are you talking about? We don't do in-memory 
split. We dump the split result to a temporary file and start a new 
map-reduce job. Split do create a map-reduce boundary (Though it is not 
entirely true, multiquery optimizer may combine some of these jobs)


Gang Luo wrote:
> Hi all
> according to the vldb 09 paper, the split operator and all its successive 
> operators reside in memory without any blocking in between. However, the source 
> code (version 0.7) shows that a MR job is actually ended when it meets the split 
> operator and multiple new MR jobs are created, each representing one branch. 
> This write-once-read-multiple-times method is different from the in-memory 
> method mentioned in that paper. Does pig change the strategy for split, or is 
> there still an in-memory version of split I didn't discover?
> Thanks,
> -Gang

View raw message