hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Viraj Bhat (JIRA)" <j...@apache.org>
Subject [jira] Created: (PIG-514) COUNT returns no results as a result of two filter statements in FOREACH
Date Fri, 31 Oct 2008 03:40:46 GMT
COUNT returns no results as a result of two filter statements in FOREACH
------------------------------------------------------------------------

                 Key: PIG-514
                 URL: https://issues.apache.org/jira/browse/PIG-514
             Project: Pig
          Issue Type: Bug
          Components: impl
    Affects Versions: types_branch
            Reporter: Viraj Bhat
             Fix For: types_branch


For the following piece of sample code in FOREACH which counts the filtered student records
based on record_type == 1 and scores and also on record_type == 0 does not seem to return
any results.

{code}
mydata = LOAD 'mystudentfile.txt' AS  (record_type,name,age,scores,gpa);
--keep only what we need
mydata_filtered = FOREACH  mydata GENERATE   record_type,  name,  age,  scores ;
--group
mydata_grouped = GROUP mydata_filtered BY  (record_type,age);

myfinaldata = FOREACH mydata_grouped {
     myfilter1 = FILTER mydata_filtered BY record_type == 1 AND age == scores;
     myfilter2 = FILTER mydata_filtered BY record_type == 0;
     GENERATE FLATTEN(group),
-- Only this count causes the problem ??
      COUNT(myfilter1) as col2,
      SUM(myfilter2.scores) as col3,
      COUNT(myfilter2) as col4;  };

--these set of statements confirm that the count on the  filters returns 1
--mycountdata = FOREACH mydata_grouped
--{
--      myfilter1 = FILTER mydata_filtered BY record_type == 1 AND age == scores;
--      GENERATE
--      COUNT(myfilter1) as colcount;
--};
--dump mycountdata;

dump myfinaldata;
{code}

But if you uncomment the  {code} COUNT(myfilter1) as col2, {code}, it seems to work with the
following results..
(0,22,45.0,2L)
(0,24,133.0,6L)
(0,25,22.0,1L)

Also I have tried to verify if this is a issue with the {code} COUNT(myfilter1) as col2, {code}
returning zero. It does not seem to be the case.
If {code}  dump mycountdata; {code} is uncommented it returns:
(1L)
(1L)

I am attaching the tab separated 'mystudentfile.txt' file used in this Pig script. Is this
an issue with 2 filters in the FOREACH followed by a COUNT on these filters??

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message