pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Reed <br...@yahoo-inc.com>
Subject Re: Automatic alias generation in cogroups
Date Mon, 05 May 2008 19:11:20 GMT
I agree that the aliases used should be overridable, but much perfer the 
current way as a default rather than $0, $1, ... Once you know what's 
happening it makes sense and it's easy to use.

The complaint about overloading reflects a misunderstanding of the scoping of 
foreach. If they don't understand it now, schemas are bound to confuse as 
well if there is any kind of conflict.

Adding the AS to cogroup sounds good. What about the fields of group?

cogroup a by (age, height), b by (avgAge, avgAge);

shouldn't you be able to pick the schema of group?


On Monday 05 May 2008 11:21:13 Alan Gates wrote:
> Currently in pig, aliases are generally only assigned by the user.
> There is one exception to this rule, which is (co)group.  Consider a
> script like:
> a = load 'myfile';
> b = load 'anotherfile';
> c = cogroup a by $0, b by $0;
> The relation c will have the aliases: group, a, b without the user
> having assigned those names.
> There are a couple of problems with this.  First, we've had a number of
> users complain that this is confusing.  a and b are suddenly overloaded
> terms in the script.  Consider, for example, that both of the following
> lines are possible and refer to entirely different meanings for 'a':
> d = filter a by $0 eq 'fred';
> d = foreach c generate count(a);
> In the first line, 'a' refers to the relation produced by the load.  In
> the second, it refers to the bag that is the second field ($1) of the
> relation 'c'.  The same holds for 'group' which is now both a keyword
> and an alias (yuck!).
> The second issue is that this is generally inconsistent.  Everywhere
> else pig latin allows users to define aliases, but here it does it
> automatically.
> So the proposal is to remove this automatic aliasing from cogroup.
> Cogroup would support AS, so that users could define aliases for these
> bags if they desired.  This may be a little difficult, as users need to
> remember to provide an alias for the group before aliasing the bags.
> For example, taking the script above:
> c = cogroup a by $0, b by $0 as name, file1, file2;
> So name would now be the alias for the group key (formerly aliased as
> 'group'), file1 for the first bag (formerly 'a') and file2 for the
> second bag (formerly 'b').
> Everything said in this applies to group as well as cogroup.
> Obviously this change isn't backward compatible.
> Thoughts?
> Alan.

View raw message