Return-Path: X-Original-To: apmail-pig-dev-archive@www.apache.org Delivered-To: apmail-pig-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CF39D9CD2 for ; Thu, 22 Mar 2012 20:44:45 +0000 (UTC) Received: (qmail 90597 invoked by uid 500); 22 Mar 2012 20:44:45 -0000 Delivered-To: apmail-pig-dev-archive@pig.apache.org Received: (qmail 90565 invoked by uid 500); 22 Mar 2012 20:44:45 -0000 Mailing-List: contact dev-help@pig.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@pig.apache.org Delivered-To: mailing list dev@pig.apache.org Received: (qmail 90557 invoked by uid 500); 22 Mar 2012 20:44:45 -0000 Delivered-To: apmail-hadoop-pig-dev@hadoop.apache.org Received: (qmail 90550 invoked by uid 99); 22 Mar 2012 20:44:45 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Mar 2012 20:44:45 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Mar 2012 20:44:43 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 3B12B3404BD for ; Thu, 22 Mar 2012 20:44:22 +0000 (UTC) Date: Thu, 22 Mar 2012 20:44:22 +0000 (UTC) From: "Prashant Kommireddi (Created) (JIRA)" To: pig-dev@hadoop.apache.org Message-ID: <1181749807.4484.1332449062256.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Created] (PIG-2610) GC errors on using FILTER within nested FOREACH MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org GC errors on using FILTER within nested FOREACH ----------------------------------------------- Key: PIG-2610 URL: https://issues.apache.org/jira/browse/PIG-2610 Project: Pig Issue Type: Bug Affects Versions: 0.9.1 Reporter: Prashant Kommireddi User has reported running into GC overhead errors while trying to use FILTER within FOREACH and aggregating the filtered field. Here is the sample PigLatin script provided by the user that generated this issue. {code} raw = LOAD 'input' using MyCustomLoader(); searches = FOREACH raw GENERATE day, searchType, FLATTEN(impBag) AS (adType, clickCount) ; groupedSearches = GROUP searches BY (day, searchType) PARALLEL 50; counts = FOREACH groupedSearches{ type1 = FILTER searches BY adType == 'type1'; type2 = FILTER searches BY adType == 'type2'; GENERATE FLATTEN(group) AS (day, searchType), COUNT(searches) numSearches, SUM(clickCount) AS clickCountPerSearchType, SUM(type1.clickCount) AS type1ClickCount, SUM(type2.clickCount) AS type2ClickCount; }; {code} Pig should be able to handle this case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira