pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "pi song" <pi.so...@gmail.com>
Subject Re: Implicit Split
Date Sat, 03 May 2008 14:17:16 GMT
Conceptually, if we could generalize our nested data processing model up to
recursive definition (partly done through the introduction of inner plans),
this problematic inner plan can be constructed easily by applying the
invariant plan compilation logic.  This sounds cool right? I really want to
see how far we can go (the point where theories meet practical world).

Back to your question, I want to see the new execution engine working as
soon as possible so I agree with you that we don't have to support this for
the time being (This use case is not quite common). I think it shouldn't be
too difficult to add this functionality later based on our current inner
plan design.

BTW, let's see what other people think.


On Sat, May 3, 2008 at 10:53 AM, Santhosh Srinivasan <sms@yahoo-inc.com>

> Pig currently allows implicit splits within the foreach block. An
> example that illustrates this behaviour follows:
>    A = load 'input1';
>    B = load 'input2';
>    C = cogroup A by $0, B by $0;
>    D = foreach C do {
>        XX = filter A by $0 > 5;
>        XY = filter B by $0 > 5; //at this point, there is an implicit
> split in the foreach plan
>        generate XX.$1, XY.$1; //here the generate needs to handle the
> merge as its inputs are from XX and XY
>    }
> Notice that there is an implicit split in the foreach plan. Each input
> tuple from C has to be piped to XX and XY. The generate has to now
> handle the merge as both XX and XY serve as inputs. The inputs to
> generate are now a DAG and not a tree.
> Generate
> /       \
> XX      XY
> \       /
> Foreach
> This makes the execution pipeline fairly complex. Should we restrict the
> usage to not allow DAGs as input to the generate?
> Thoughts?
> Thanks,
> Santhosh

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message