pig-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Gates <ga...@yahoo-inc.com>
Subject Re: Nested expressions in FOREACH vs FILTER
Date Fri, 28 Mar 2008 17:07:13 GMT
I'm not clear on the semantics you're proposing for filter.  I think 
what you're saying is that pig cannot apply a relation level 
conditional (instead of record level conditional) in a natural way.

To be clear, pig can do a record level conditional like:

c = foreach b generate ($0 > '1' ? $1 : $2);

But if you instead want to apply the conditional to the entire 
relation, we have to do something contorted (like the workaround you 
suggest).  You'd like to be able to do something like:

c = b generate (any $0 > '1' ? $1 : $2);

where the 'any' operator is applied to all of $0 instead of being 
applied a row at a time.

Is that correct, or are you suggesting more than that?  Or perhaps 
something altogether different?

Alan.

On Mar 27, 2008, at 3:37 PM, Mridul Muralidharan wrote:

> Hi,
>
>   FOREACH supports nested expressions of form :
> var1 = FOREACH var { <expr>'s; GENERATE <tuple> }
>
> Similar functionality does not seem to be available with FILTER.
> That is, slightly complex filter expressions - particularly when we 
> need to process the Bag/tuples contained as tuples of the relation in 
> question is not possible.
>
> Mirroring FOREACH functionality, something like this would be great :
>
> var1 = FILTER var {
>   t1 = <expr>;
>   t2 = <expr>;
>   ...
>   BY (conds);
> }
>
>
> Workaround for the immediate problem I am facing is to use FOREACH to 
> generate something like $status, <tuple> and then FILTER on $status.
> Followed by another FOREACH to remove the status.
>
> Regards,
> Mridul


Mime
View raw message