datafu-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthew Hayes (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DATAFU-39) RFE: BagSum
Date Sun, 27 Apr 2014 18:39:14 GMT

    [ https://issues.apache.org/jira/browse/DATAFU-39?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13982417#comment-13982417
] 

Matthew Hayes commented on DATAFU-39:
-------------------------------------

Now that you've fixed BagGroup, you can implement this without the need for a BagSum UDF.
 Pig allows you to use SUM within a nested FOREACH.

This test passes:

{code}
/**
  define BagSum datafu.pig.bags.BagSum();
  define BagGroup datafu.pig.bags.BagGroup();
  
  data = LOAD 'input' USING PigStorage(',') AS (id:int, key:chararray, val:int);
  describe data;
  
  data2 = GROUP data BY id;
  data3 = FOREACH data2 GENERATE group as id, BagGroup(data,data.key) as grouped;
  
  describe data3;
  
  data4 = FOREACH data3 {
    summed = FOREACH grouped GENERATE group as key, SUM($1.val) as total;
    ordered = ORDER summed BY key;
    GENERATE id, ordered;
  }
  
  describe data4;
  
  STORE data4 INTO 'output';

   */
  @Multiline
  private String bagSumTest;
  
  @Test
  public void bagSumTest() throws Exception
  {
    PigTest test = createPigTestFromString(bagSumTest);
    writeLinesToFile("input", "1,A,1","1,B,2","2,A,3","3,A,4","1,C,5","1,C,6",
                     "3,A,7","2,B,8","1,A,9","2,A,10");
    test.runScript();
    assertOutput(test, "data4", 
                 "(1,{(A,10),(B,2),(C,11)})",
                 "(2,{(A,13),(B,8)})",
                 "(3,{(A,11)})");
  }
{code}

> RFE: BagSum
> -----------
>
>                 Key: DATAFU-39
>                 URL: https://issues.apache.org/jira/browse/DATAFU-39
>             Project: DataFu
>          Issue Type: New Feature
>            Reporter: Sam Steingold
>
> I need a new function {{BagSum}} which would help me solve the problem described in [http://stackoverflow.com/questions/22945236/how-do-i-accumulate-vectors-into-a-map].
> Test case:
> {code}
>   /**
>   
>   define BagSum datafu.pig.bags.BagSum();
>   
>   data = LOAD 'input' AS (id:int, key:chararray, val:int);
>   describe data;
>   
>   data2 = FOREACH (GROUP data BY id) GENERATE group as id, BagSum(data.(key,val),data.key)
as keys;
>   describe data2;
>   
>   STORE data2 INTO 'output';
>    */
>   @Multiline
>   private String bagSumTest;
>   
>   @Test
>   public void bagSumTest() throws Exception
>   {
>     PigTest test = createPigTestFromString(bagSumTest);
>     writeLinesToFile("input", "(1,A,1)","(1,B,2)","(2,A,3)","(3,A,4)","(1,C,5)","(1,C,6)",
>                      "(3,A,7)","(2,B,8)","(1,A,9)","(2,A,10)");
>     test.runScript();
>     assertOutput(test, "data2", "(1,{(A,10),(B,2),(C,11)})",
>                  "(2,{(A,13),(B,8)})","(3,{(A,11)})");
>   }
> {code}
> Thanks.
> (alternatively, please tell me how to implement this using existing features)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message