datafu-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthew Hayes (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DATAFU-41) BagGroup does not name bag field in some cases
Date Sun, 27 Apr 2014 18:47:22 GMT
Matthew Hayes created DATAFU-41:
-----------------------------------

             Summary: BagGroup does not name bag field in some cases
                 Key: DATAFU-41
                 URL: https://issues.apache.org/jira/browse/DATAFU-41
             Project: DataFu
          Issue Type: Bug
            Reporter: Matthew Hayes


For this test:

{code}
/**
  define BagSum datafu.pig.bags.BagSum();
  define BagGroup datafu.pig.bags.BagGroup();
  
  data = LOAD 'input' USING PigStorage(',') AS (id:int, key:chararray, val:int);
  describe data;
  
  data2 = GROUP data BY id;
  
  describe data2;
  
  data3 = FOREACH data2 GENERATE group as id, BagGroup(data,data.key) as grouped;
  
  describe data3;
  
  data4 = FOREACH data3 {
    summed = FOREACH grouped GENERATE group as key, SUM($1.val) as total;
    ordered = ORDER summed BY key;
    GENERATE id, ordered;
  }
  
  describe data4;
  
  STORE data4 INTO 'output';

   */
  @Multiline
  private String bagSumTest;
  
  @Test
  public void bagSumTest() throws Exception
  {
    PigTest test = createPigTestFromString(bagSumTest);
    writeLinesToFile("input", "1,A,1","1,B,2","2,A,3","3,A,4","1,C,5","1,C,6",
                     "3,A,7","2,B,8","1,A,9","2,A,10");
    test.runScript();
    assertOutput(test, "data4", 
                 "(1,{(A,10),(B,2),(C,11)})",
                 "(2,{(A,13),(B,8)})",
                 "(3,{(A,11)})");
  }
{code}

{{data3}} is described as:

{code}
data3: {id: int,grouped: {(group: chararray,data: {(id: int,key: chararray,val: int)})}}
{code}

However, if we change {{data}} to {{data.(key,val)}} then {{data3}} is described as:

{code}
data3: {id: int,grouped: {(group: chararray,{(key: chararray,val: int)})}}
{code}

Note that there is no name, so you have to reference it by {{$1}}.  There is a separate issues,
DATAFU-40, where even when it has the name {{data}} you can run into problems later.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message