hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Olga Natkovich (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1494) PIG Logical Optimization: Use CNF in PushUpFilter
Date Fri, 27 Aug 2010 20:24:54 GMT

    [ https://issues.apache.org/jira/browse/PIG-1494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903593#action_12903593
] 

Olga Natkovich commented on PIG-1494:
-------------------------------------

Can this be moved from 0.8 to 0.9 release since we are about to branch for 0.9?

> PIG Logical Optimization: Use CNF in PushUpFilter
> -------------------------------------------------
>
>                 Key: PIG-1494
>                 URL: https://issues.apache.org/jira/browse/PIG-1494
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>    Affects Versions: 0.7.0
>            Reporter: Swati Jain
>            Assignee: Swati Jain
>            Priority: Minor
>             Fix For: 0.8.0
>
>
> The PushUpFilter rule is not able to handle complicated boolean expressions.
> For example, SplitFilter rule is splitting one LOFilter into two by "AND". However it
will not be able to split LOFilter if the top level operator is "OR". For example:
> *ex script:*
> A = load 'file_a' USING PigStorage(',') as (a1:int,a2:int,a3:int);
> B = load 'file_b' USING PigStorage(',') as (b1:int,b2:int,b3:int);
> C = load 'file_c' USING PigStorage(',') as (c1:int,c2:int,c3:int);
> J1 = JOIN B by b1, C by c1;
> J2 = JOIN J1 by $0, A by a1;
> D = *Filter J2 by ( (c1 < 10) AND (a3+b3 > 10) ) OR (c2 == 5);*
> explain D;
> In the above example, the PushUpFilter is not able to push any filter condition across
any join as it contains columns from all branches (inputs). But if we convert this expression
into "Conjunctive Normal Form" (CNF) then we would be able to push filter condition c1<
10 and c2 == 5 below both join conditions. Here is the CNF expression for highlighted line:
> ( (c1 < 10) OR (c2 == 5) ) AND ( (a3+b3 > 10) OR (c2 ==5) )
> *Suggestion:* It would be a good idea to convert LOFilter's boolean expression into CNF,
it would then be easy to push parts (conjuncts) of the LOFilter boolean expression selectively.
We would also not require rule SplitFilter anymore if we were to add this utility to rule
PushUpFilter itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message