pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Koji Noguchi (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PIG-4551) Partition filter is not pushed down in case of SPLIT
Date Tue, 28 Jul 2015 20:35:05 GMT

     [ https://issues.apache.org/jira/browse/PIG-4551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Koji Noguchi updated PIG-4551:
    Attachment: pig-4551_v02_notestyet.patch

Added extra conditions to 
* Only insert merged filter when at least one of the loader contains partition or predicate
fields.  (Although no checking on whether the merged filter contains any of the fields since
they could be renamed etc.)

* Making sure filters do not contain nonDeterministicUdf.

One worry with my approach is the overhead I may be adding with this extra filter (and when
it cannot be pushed down).

While I wait for feedback on my approach, I'll start adding test cases.

> Partition filter is not pushed down in case of SPLIT
> ----------------------------------------------------
>                 Key: PIG-4551
>                 URL: https://issues.apache.org/jira/browse/PIG-4551
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.11.1
>            Reporter: Rohini Palaniswamy
>         Attachments: pig-4551_v01_notestyet.patch, pig-4551_v02_notestyet.patch
>   The below query with implicit split will not push down the partition filters and will
scan the whole table. 
> {code}
> A  = LOAD 'db1.table1'        USING org.apache.hive.hcatalog.pig.HCatLoader();
> B = FILTER A BY ( ((date=='20150501' AND pk2 =='1')) and pk3 == '127' );
> C  = FILTER A BY ( ((date=='20150501' AND pk2=='1') OR (date=='20150430' AND pk2=='1'))
and pk3 == '127' );
> {code}
> The workaround now is to write two separate LOAD statements for each FILTER. We should
do that behind the scenes while planning instead of user having to do that.

This message was sent by Atlassian JIRA

View raw message