pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prashant Kommireddi (Created) (JIRA)" <j...@apache.org>
Subject [jira] [Created] (PIG-2610) GC errors on using FILTER within nested FOREACH
Date Thu, 22 Mar 2012 20:44:22 GMT
GC errors on using FILTER within nested FOREACH

                 Key: PIG-2610
                 URL: https://issues.apache.org/jira/browse/PIG-2610
             Project: Pig
          Issue Type: Bug
    Affects Versions: 0.9.1
            Reporter: Prashant Kommireddi

User has reported running into GC overhead errors while trying to use FILTER within FOREACH
and aggregating the filtered field. Here is the sample PigLatin script provided by the user
that generated this issue. 

raw = LOAD 'input' using MyCustomLoader();

searches = FOREACH raw GENERATE
               day, searchType,
               FLATTEN(impBag) AS (adType, clickCount)

groupedSearches = GROUP searches BY (day, searchType) PARALLEL 50;
counts = FOREACH groupedSearches{
               type1 = FILTER searches BY adType == 'type1';
               type2 = FILTER searches BY adType == 'type2';
                   FLATTEN(group) AS (day, searchType),
                   COUNT(searches) numSearches,
                   SUM(clickCount) AS clickCountPerSearchType,
                   SUM(type1.clickCount) AS type1ClickCount,
                   SUM(type2.clickCount) AS type2ClickCount;

Pig should be able to handle this case.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message