datafu-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthew Hayes (JIRA)" <>
Subject [jira] [Created] (DATAFU-41) BagGroup does not name bag field in some cases
Date Sun, 27 Apr 2014 18:47:22 GMT
Matthew Hayes created DATAFU-41:

             Summary: BagGroup does not name bag field in some cases
                 Key: DATAFU-41
             Project: DataFu
          Issue Type: Bug
            Reporter: Matthew Hayes

For this test:

  define BagSum datafu.pig.bags.BagSum();
  define BagGroup datafu.pig.bags.BagGroup();
  data = LOAD 'input' USING PigStorage(',') AS (id:int, key:chararray, val:int);
  describe data;
  data2 = GROUP data BY id;
  describe data2;
  data3 = FOREACH data2 GENERATE group as id, BagGroup(data,data.key) as grouped;
  describe data3;
  data4 = FOREACH data3 {
    summed = FOREACH grouped GENERATE group as key, SUM($1.val) as total;
    ordered = ORDER summed BY key;
    GENERATE id, ordered;
  describe data4;
  STORE data4 INTO 'output';

  private String bagSumTest;
  public void bagSumTest() throws Exception
    PigTest test = createPigTestFromString(bagSumTest);
    writeLinesToFile("input", "1,A,1","1,B,2","2,A,3","3,A,4","1,C,5","1,C,6",
    assertOutput(test, "data4", 

{{data3}} is described as:

data3: {id: int,grouped: {(group: chararray,data: {(id: int,key: chararray,val: int)})}}

However, if we change {{data}} to {{data.(key,val)}} then {{data3}} is described as:

data3: {id: int,grouped: {(group: chararray,{(key: chararray,val: int)})}}

Note that there is no name, so you have to reference it by {{$1}}.  There is a separate issues,
DATAFU-40, where even when it has the name {{data}} you can run into problems later.

This message was sent by Atlassian JIRA

View raw message