hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Santhosh Srinivasan" <...@yahoo-inc.com>
Subject Implicit Split
Date Sat, 03 May 2008 00:53:40 GMT
Pig currently allows implicit splits within the foreach block. An
example that illustrates this behaviour follows:

    A = load 'input1';
    B = load 'input2';
    C = cogroup A by $0, B by $0;
    D = foreach C do {
        XX = filter A by $0 > 5;
        XY = filter B by $0 > 5; //at this point, there is an implicit
split in the foreach plan
        generate XX.$1, XY.$1; //here the generate needs to handle the
merge as its inputs are from XX and XY
    }

Notice that there is an implicit split in the foreach plan. Each input
tuple from C has to be piped to XX and XY. The generate has to now
handle the merge as both XX and XY serve as inputs. The inputs to
generate are now a DAG and not a tree.

Generate
/	\
XX	XY
\	/
Foreach

This makes the execution pipeline fairly complex. Should we restrict the
usage to not allow DAGs as input to the generate?


Thoughts?

Thanks,
Santhosh

Mime
View raw message