pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Gates <ga...@yahoo-inc.com>
Subject Re: Proposed design for adding control flow to Pig
Date Sat, 16 Oct 2010 02:38:04 GMT
Basically it's a matter of clarity.  I agree that it creates a lot of  
boiler plate, but we thought it made it more clear exactly what was  
being passed in and out of the macro.  Especially in cases where a  
macro returns multiple outputs (that is, you can't just look at the  
last line and see what it is returning).  In the original proposal the  
store would basically act as a return statement.  But perhaps we're  
optimizing for the less common case.  If others agree that a more  
terse (but less clear) syntax is better, I'm open to that.

One change I would want to make.  In your proposal, it isn't obvious  
what is input and output without examining the macro.  In cases where  
the macro is more than a few lines this will be hard to use.  This  
could be addressed though by adding an 'out' keyword, so that it  
becomes:

define bot_cleanser[X out, Y](user) {
	X = filter Y by not is_a_bot($user);
}

Alan.

On Oct 15, 2010, at 5:58 PM, Scott Carey wrote:

> I'm most interested in the macro expansion and importing other files  
> for shared common code.  I could be missing something, but the  
> TempStorage thing necessary?
>
> bot_filter.pig:
> --------------
> define bot_cleanser(user) {
>    A = load 'bc_input' using TempStorage();
>    B = filter A by not is_a_bot($user);
>    store B into 'bc_output' using TempStorage();
> }
> ----------------
> main.pig:
> -------------------
> import bot_filter.pig;
>
> A = load 'fact';
> store A into 'bc_input' using TempStorage();
> inline bot_cleanser('username');
> B = load 'bc_output' using TempStorage();
> C = group B by user;
> ...
> store Z into 'processed';
> -----------------------
>
> Couldn't we pass aliases in instead and remove lots of boilerplate?
>
> bot_filter.pig:
> --------------
> define bot_cleanser[X,Y](user) {
>    X = filter Y by not is_a_bot($user);
> }
> ----------------
> main.pig:
> -------------------
> import bot_filter.pig;
>
> A = load 'fact';
> inline bot_cleanser[A,B]('username');
> C = group B by user;
> ...
> store Z into 'processed';
> -----------------------
>
> The inline then would substitute A for X, B for Y, and 'username'  
> for user.  Aliases are separated from other parameters because we  
> may actually be declaring new aliases when inlining and it should be  
> easier to deal with the semantic differences that way.  In  
> particular, the [A, B] above are essentially declaring that the  
> macro 'shares' these aliases, and all other aliases do not overlap.
>
> Any aliases not declared up front are renamed as to not collide when  
> inlined.  I look at the macro expansion and function examples and  
> see tons of alias naming boilerplate that should IMO be implicit  
> somehow.  Pig already has a lot of alias and field naming  
> boilerplate, I would like to avoid introducing more.  Otherwise, I'm  
> sure I'll use a preprocessor again to get rid of it :).
>
>
>
>
> On Oct 15, 2010, at 4:39 PM, Alan Gates wrote:
>
>> After several months of mulling things around Richard and I have put
>> together a proposed design for adding control flow to Pig.  See http://wiki.apache.org/pig/TuringCompletePig
>> for complete details.  Please give us your feedback.
>>
>> Alan.
>


Mime
View raw message