hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Gates (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1634) Multiple names for the "group" field
Date Tue, 21 Sep 2010 17:52:36 GMT

    [ https://issues.apache.org/jira/browse/PIG-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913115#action_12913115

Alan Gates commented on PIG-1634:

In Pig's semantics c.group, c.foo, and c.bar are all separate columns, and only the first
one is $0.  Because the bags from the cogroup contain all columns in the row (not just non-key
columns) foo is in a and bar in b.  

Changing something like this would be a radical shift of Pig semantics.

> Multiple names for the "group" field
> ------------------------------------
>                 Key: PIG-1634
>                 URL: https://issues.apache.org/jira/browse/PIG-1634
>             Project: Pig
>          Issue Type: New Feature
>    Affects Versions: 0.1.0, 0.2.0, 0.3.0, 0.4.0, 0.5.0, 0.6.0, 0.7.0
>            Reporter: Viraj Bhat
> I am hoping that in Pig if I type 
> {quote} c = cogroup a by foo, b by bar", the fields c.group, c.foo  and c.bar should
all map to c.$0 {quote} 
> This would improve the readability  of the Pig script.
> Here's a real usecase:
> {code}
> ---
> pages = LOAD 'pages.dat'  AS (url, pagerank);
> visits = LOAD 'user_log.dat'  AS (user_id, url);
> page_visits = COGROUP pages BY url, visits BY url;
> frequent_visits = FILTER page_visits BY COUNT(visits) >= 2;
> answer = FOREACH frequent_visits  GENERATE url, FLATTEN(pages.pagerank);
> ---
> {code}
> (The important part is the final GENERATE statement, which references   the field "url",
which was the grouping field in the earlier COGROUP.)  To get it  to work I have to write
it in a less intuitive way.
> Maybe with the new parser changes in Pig 0.9 it would be easier to specify that.
> Viraj

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message