datafu-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthew Hayes (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DATAFU-38) BagGroup merges rows
Date Sat, 26 Apr 2014 12:47:14 GMT

    [ https://issues.apache.org/jira/browse/DATAFU-38?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13981971#comment-13981971
] 

Matthew Hayes commented on DATAFU-38:
-------------------------------------

Yea looks like the assignee is wrong.  I can't figure out how to add Sam to the contributors
list because "Sam" returns over 600 results and you can't enter a username.

> BagGroup merges rows
> --------------------
>
>                 Key: DATAFU-38
>                 URL: https://issues.apache.org/jira/browse/DATAFU-38
>             Project: DataFu
>          Issue Type: Bug
>            Reporter: Sam
>             Fix For: 1.3.0
>
>         Attachments: BagGroup-38.patch
>
>
> load 
> {code}
> 1,a,A,1
> 1,b,A,2
> 1,a,B,3
> 2,c,C,4
> 2,b,B,5
> 2,b,C,6
> {code}
> using {{tmp_datafu = load 'test' using PigStorage(',') as (id:chararray, domain:chararray,
keyword:chararray, weight:int);}}
> and do
> {code}
> tmp_roll = foreach (group tmp_datafu by id) generate
>   group as id,
>   CountEach(tmp_datafu.domain) as domains,
>   BagGroup(tmp_datafu.(keyword,weight),tmp_datafu.keyword) as keywords;
> {code}
> the result is
> {code}
> (1,{(b,1),(a,2)},{(B,{(B,3)}),(A,{(A,1),(A,2)})})
> (2,{(c,1),(b,2)},{(B,{(B,3),(B,5)}),(A,{(A,1),(A,2)}),(C,{(C,4),(C,6)})})
> {code}
> instead of
> {code}
> (1,{(b,1),(a,2)},{(B,{(B,3)}),(A,{(A,1),(A,2)})})
> (2,{(c,1),(b,2)},{(B,{(B,5)}),(C,{(C,4),(C,6)})})
> {code}
> see also
> http://stackoverflow.com/questions/22945236/how-do-i-accumulate-vectors-into-a-map



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message