pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yu Xu (JIRA)" <j...@apache.org>
Subject [jira] [Created] (PIG-2747) Support more predicate pushdown to a data source by pulling up multiple predicates from branches using the same data source
Date Mon, 11 Jun 2012 18:16:42 GMT
Yu Xu created PIG-2747:
--------------------------

             Summary: Support more predicate pushdown to a data source by pulling up multiple
predicates from branches using the same data source
                 Key: PIG-2747
                 URL: https://issues.apache.org/jira/browse/PIG-2747
             Project: Pig
          Issue Type: Improvement
            Reporter: Yu Xu
            Priority: Minor


consider the following example:

T = load ... ;
T1 = filter T by col == 'hello';
T2 = filter T by col =='world';

currently Pig optimizer does not combine the two predicates and cannot push down the predicates
to the data sources (via LoadMetadata).  Thus the data source cannot do any filtering. A full
table/file scan is required.

A current more efficient workaround (by hand) is to rewrite the above script to the following
equivalent one:

T = load ...;
T = filter T by col == 'hello' or col == 'world' ;
T1 = filter T by col == 'hello';
T2 = filter T by col == 'world';

the above script enables Pig to push down the predicate (col == 'hello' or col == 'world')
to the data source to use available partitions/indexes for potentially much more efficient
processing. 

This JIRA is created to request PIG optimizer to perform the above type of optimization automatically.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message