datafu-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthew Hayes (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DATAFU-42) Simplify BagGroup output
Date Mon, 28 Apr 2014 15:14:17 GMT

    [ https://issues.apache.org/jira/browse/DATAFU-42?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13983086#comment-13983086
] 

Matthew Hayes commented on DATAFU-42:
-------------------------------------

The current implementation mimics how GROUP works in Pig.  When you group a relation by a
key, that key is available as {{group}} but it is also included in the bag as well.  I'll
think about this some more but I'm not sure it makes sense to do this.  If one is concerned
about additional space being used by this key it should be possible to project it out after
BagGroup is invoked.

> Simplify BagGroup output
> ------------------------
>
>                 Key: DATAFU-42
>                 URL: https://issues.apache.org/jira/browse/DATAFU-42
>             Project: DataFu
>          Issue Type: Improvement
>            Reporter: Sam Steingold
>
> {{BagGroup}} keeps the redundant {{group}} information in its output.
> E.g., see [DATAFU-38]:
> {code}
> (1,{(b,1),(a,2)},{(B,{(B,3)}),(A,{(A,1),(A,2)})})
> (2,{(c,1),(b,2)},{(B,{(B,3),(B,5)}),(A,{(A,1),(A,2)}),(C,{(C,4),(C,6)})})
> {code}
> can be
> {code}
> (1,{(b,1),(a,2)},{(B,{3}),(A,{1,2})})
> (2,{(c,1),(b,2)},{(B,{3,5}),(A,{1,2}),(C,{4,6})})
> {code}
> without loss of information
> Given that the bug [DATAFU-38] rendered this function quite useless and it was fixed
just last week, I think {{BagGroup}} has not been used before, so this backward-incompatible
change will not break any existing code.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message