Return-Path: X-Original-To: apmail-pig-dev-archive@www.apache.org Delivered-To: apmail-pig-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D6FD89558 for ; Wed, 2 Nov 2011 21:55:54 +0000 (UTC) Received: (qmail 49789 invoked by uid 500); 2 Nov 2011 21:55:54 -0000 Delivered-To: apmail-pig-dev-archive@pig.apache.org Received: (qmail 49664 invoked by uid 500); 2 Nov 2011 21:55:54 -0000 Mailing-List: contact dev-help@pig.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@pig.apache.org Delivered-To: mailing list dev@pig.apache.org Received: (qmail 49643 invoked by uid 500); 2 Nov 2011 21:55:54 -0000 Delivered-To: apmail-hadoop-pig-dev@hadoop.apache.org Received: (qmail 49639 invoked by uid 99); 2 Nov 2011 21:55:54 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Nov 2011 21:55:54 +0000 X-ASF-Spam-Status: No, hits=-2001.2 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Nov 2011 21:55:52 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id DF5C132E107 for ; Wed, 2 Nov 2011 21:55:32 +0000 (UTC) Date: Wed, 2 Nov 2011 21:55:32 +0000 (UTC) From: "Ashutosh Chauhan (Commented) (JIRA)" To: pig-dev@hadoop.apache.org Message-ID: <2124999655.52999.1320270932916.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1306304727.35476.1319858372241.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (PIG-2339) HCatLoader loads all the partitions in a partitioned table even though a filter clause on the partitions is specified in the Pig script MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/PIG-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142592#comment-13142592 ] Ashutosh Chauhan commented on PIG-2339: --------------------------------------- @Daniel, TypeCastInserter shouldn't there be in first place in this plan. Correct? > HCatLoader loads all the partitions in a partitioned table even though a filter clause on the partitions is specified in the Pig script > --------------------------------------------------------------------------------------------------------------------------------------- > > Key: PIG-2339 > URL: https://issues.apache.org/jira/browse/PIG-2339 > Project: Pig > Issue Type: Bug > Affects Versions: 0.9.0 > Reporter: Viraj Bhat > Assignee: Daniel Dai > Fix For: 0.9.1 > > Attachments: PIG-2339-1.patch > > > A table created by HCAT has the following partitions; > hcat -e "show partitions paritionedtable" > {quote} > grid=AB/dt=2011_07_01 > grid=AB/dt=2011_07_02 > grid=AB/dt=2011_07_03 > grid=XY/dt=2011_07_01 > grid=XY/dt=2011_07_02 > grid=XY/dt=2011_07_03 > grid=XY/dt=2011_07_04 > ... > {quote} > The total number of partitions in the table is around 3200. > A Pig script of this nature tries to access this data using the partitions in it's filter. > {script} > A = LOAD 'paritionedtable' USING org.apache.hcatalog.pig.HCatLoader(); > B = FILTER A BY grid=='AB' AND dt=='2011_07_04'; > C = LIMIT B 10; > store C into 'HCAT' using PigStorage(); > {script} > This script, fails to run as the job.xml generated by Pig is so large (8MB), that the Hadoop Fred's limitation does not allow it to submit the job. > After debugging it was found that in the HCatTableInfo class the function gets a null filter value. getInputTableInfo(filter=null ..) > I suspect that "setPartitionFilter" function in Pig does not pass the filter correctly to the HCatLoader. This is happening with both Pig 0.9 and 0.8 > Viraj -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira