pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Chauhan (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-2339) HCatLoader loads all the partitions in a partitioned table even though a filter clause on the partitions is specified in the Pig script
Date Wed, 02 Nov 2011 21:55:32 GMT

    [ https://issues.apache.org/jira/browse/PIG-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142592#comment-13142592
] 

Ashutosh Chauhan commented on PIG-2339:
---------------------------------------

@Daniel,
TypeCastInserter shouldn't there be in first place in this plan. Correct?
                
> HCatLoader loads all the partitions in a partitioned table even though a filter clause
on the partitions is specified in the Pig script
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-2339
>                 URL: https://issues.apache.org/jira/browse/PIG-2339
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.9.0
>            Reporter: Viraj Bhat
>            Assignee: Daniel Dai
>             Fix For: 0.9.1
>
>         Attachments: PIG-2339-1.patch
>
>
> A table created by HCAT has the following partitions; 
> hcat -e "show partitions paritionedtable"
> {quote}
> grid=AB/dt=2011_07_01
> grid=AB/dt=2011_07_02
> grid=AB/dt=2011_07_03
> grid=XY/dt=2011_07_01
> grid=XY/dt=2011_07_02
> grid=XY/dt=2011_07_03
> grid=XY/dt=2011_07_04
> ...
> {quote}
> The total number of partitions in the table is around 3200.
> A Pig script of this nature tries to access this data using the partitions in it's filter.

> {script}
> A = LOAD 'paritionedtable' USING org.apache.hcatalog.pig.HCatLoader();
> B = FILTER A BY grid=='AB' AND dt=='2011_07_04';
> C = LIMIT B 10;
> store C into 'HCAT' using PigStorage();
> {script}
> This script, fails to run as the job.xml generated by Pig is so large (8MB), that the
Hadoop Fred's limitation does not allow it to submit the job. 
> After debugging it was found that in the HCatTableInfo class the function gets a null
filter value. getInputTableInfo(filter=null ..)
> I suspect that "setPartitionFilter" function in Pig does not pass the filter correctly
to the HCatLoader. This is happening with both Pig 0.9 and 0.8
> Viraj

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message