hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "pi song" <pi.so...@gmail.com>
Subject Re: Implicit Split
Date Mon, 05 May 2008 12:34:52 GMT
Mridul,

By design, we are heading to the point where all (or nearly all) operators
are supported in nested queries plus they will not be limited to only 1
nested level.

Pi

On Mon, May 5, 2008 at 4:48 PM, Mridul Muralidharan <mridulm@yahoo-inc.com>
wrote:

>
> This is something which is quite heavily (atleast by our team).
> I was hoping this would be expanded - like add support for  nested
> statement support in FILTER also (like in FOREACH), for example : currently
> we have to hack using FOREACH & flags statements to functionality since
> FILTER does not support it.
>
> Regards,
> Mridul
>
>
> Santhosh Srinivasan wrote:
>
> > Pig currently allows implicit splits within the foreach block. An
> > example that illustrates this behaviour follows:
> >
> >    A = load 'input1';
> >    B = load 'input2';
> >    C = cogroup A by $0, B by $0;
> >    D = foreach C do {
> >        XX = filter A by $0 > 5;
> >        XY = filter B by $0 > 5; //at this point, there is an implicit
> > split in the foreach plan
> >        generate XX.$1, XY.$1; //here the generate needs to handle the
> > merge as its inputs are from XX and XY
> >    }
> >
> > Notice that there is an implicit split in the foreach plan. Each input
> > tuple from C has to be piped to XX and XY. The generate has to now
> > handle the merge as both XX and XY serve as inputs. The inputs to
> > generate are now a DAG and not a tree.
> >
> > Generate
> > /       \
> > XX      XY
> > \       /
> > Foreach
> >
> > This makes the execution pipeline fairly complex. Should we restrict the
> > usage to not allow DAGs as input to the generate?
> >
> >
> > Thoughts?
> >
> > Thanks,
> > Santhosh
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message