datafu-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthew Hayes (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DATAFU-39) RFE: BagSum
Date Tue, 29 Apr 2014 20:03:14 GMT

    [ https://issues.apache.org/jira/browse/DATAFU-39?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984719#comment-13984719
] 

Matthew Hayes commented on DATAFU-39:
-------------------------------------

I don't think you should worry about it.  I can't see a more concise way to do this.  The
only alternative I see is to write a UDF, but I don't know if this UDF would be that useful
in general.  The only benefit it would add is making the code more concise and maybe slightly
more efficient, but I think it could result in code that is more confusing as this can be
expressed in Pig Latin in a pretty readable way.

> RFE: BagSum
> -----------
>
>                 Key: DATAFU-39
>                 URL: https://issues.apache.org/jira/browse/DATAFU-39
>             Project: DataFu
>          Issue Type: New Feature
>            Reporter: Sam Steingold
>
> I need a new function {{BagSum}} which would help me solve the problem described in [http://stackoverflow.com/questions/22945236/how-do-i-accumulate-vectors-into-a-map].
> Test case:
> {code}
>   /**
>   
>   define BagSum datafu.pig.bags.BagSum();
>   
>   data = LOAD 'input' AS (id:int, key:chararray, val:int);
>   describe data;
>   
>   data2 = FOREACH (GROUP data BY id) GENERATE group as id, BagSum(data.(key,val),data.key)
as keys;
>   describe data2;
>   
>   STORE data2 INTO 'output';
>    */
>   @Multiline
>   private String bagSumTest;
>   
>   @Test
>   public void bagSumTest() throws Exception
>   {
>     PigTest test = createPigTestFromString(bagSumTest);
>     writeLinesToFile("input", "(1,A,1)","(1,B,2)","(2,A,3)","(3,A,4)","(1,C,5)","(1,C,6)",
>                      "(3,A,7)","(2,B,8)","(1,A,9)","(2,A,10)");
>     test.runScript();
>     assertOutput(test, "data2", "(1,{(A,10),(B,2),(C,11)})",
>                  "(2,{(A,13),(B,8)})","(3,{(A,11)})");
>   }
> {code}
> Thanks.
> (alternatively, please tell me how to implement this using existing features)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message