hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Santhosh Srinivasan" <...@yahoo-inc.com>
Subject Implicit Split
Date Sat, 03 May 2008 00:53:40 GMT
Pig currently allows implicit splits within the foreach block. An
example that illustrates this behaviour follows:

    A = load 'input1';
    B = load 'input2';
    C = cogroup A by $0, B by $0;
    D = foreach C do {
        XX = filter A by $0 > 5;
        XY = filter B by $0 > 5; //at this point, there is an implicit
split in the foreach plan
        generate XX.$1, XY.$1; //here the generate needs to handle the
merge as its inputs are from XX and XY

Notice that there is an implicit split in the foreach plan. Each input
tuple from C has to be piped to XX and XY. The generate has to now
handle the merge as both XX and XY serve as inputs. The inputs to
generate are now a DAG and not a tree.

/	\
\	/

This makes the execution pipeline fairly complex. Should we restrict the
usage to not allow DAGs as input to the generate?



View raw message