hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mridul Muralidharan <mrid...@yahoo-inc.com>
Subject Re: Implicit Split
Date Mon, 05 May 2008 06:48:11 GMT

This is something which is quite heavily (atleast by our team).
I was hoping this would be expanded - like add support for  nested 
statement support in FILTER also (like in FOREACH), for example : 
currently we have to hack using FOREACH & flags statements to 
functionality since FILTER does not support it.

Regards,
Mridul

Santhosh Srinivasan wrote:
> Pig currently allows implicit splits within the foreach block. An
> example that illustrates this behaviour follows:
> 
>     A = load 'input1';
>     B = load 'input2';
>     C = cogroup A by $0, B by $0;
>     D = foreach C do {
>         XX = filter A by $0 > 5;
>         XY = filter B by $0 > 5; //at this point, there is an implicit
> split in the foreach plan
>         generate XX.$1, XY.$1; //here the generate needs to handle the
> merge as its inputs are from XX and XY
>     }
> 
> Notice that there is an implicit split in the foreach plan. Each input
> tuple from C has to be piped to XX and XY. The generate has to now
> handle the merge as both XX and XY serve as inputs. The inputs to
> generate are now a DAG and not a tree.
> 
> Generate
> /	\
> XX	XY
> \	/
> Foreach
> 
> This makes the execution pipeline fairly complex. Should we restrict the
> usage to not allow DAGs as input to the generate?
> 
> 
> Thoughts?
> 
> Thanks,
> Santhosh


Mime
View raw message