pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Swati Jain (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1530) PIG Logical Optimization: Push LOFilter above LOCogroup
Date Mon, 02 Aug 2010 05:53:16 GMT

    [ https://issues.apache.org/jira/browse/PIG-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894467#action_12894467

Swati Jain commented on PIG-1530:

a) This is not a developer coding issue. The example I gave is in fact a fairly simple one.
Developer programs could be fairly complex and it is not always easy for the developer to
do such optimizations on his own. One of the important advantages of an optimizer is to remove
the burden of thinking about these from the developer.

b) A general filter pushup rule (as you correctly observe) must be able to push a filter as
far up as possible. The way this would work is iterative application of pushing LOFilter across
all relational operators. Simple rules must exist for pushing a filter above individual relational
operators, these in conjunction would allow a filter to be pushed up as far as it can go.
As an example, after I added the rule for the above, I can see a program where the LOFilter
is below a LOForeach-LOCogroup pair pushed above LOCogroup. This was the result of applying
PushUpFilter across LOCogroup and LOForeach (which already exists as a separate rule).

c) Each relational operator has specifics which make it hard to write a single pattern and
must be handled separately to ensure nuances specific to that relational operator are handled
correctly. Both LOCogroup and LOJoin are examples where the rules have fairly distinct logic.
I do think however that there should be a single rule (with multiple patterns) which handles
pushing up an LOFilter. That is the reason why I have added the LOCogroup optimization in
PushUpFilter instead of creating a separate rule.

>  PIG Logical Optimization: Push LOFilter above LOCogroup
> --------------------------------------------------------
>                 Key: PIG-1530
>                 URL: https://issues.apache.org/jira/browse/PIG-1530
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Swati Jain
>            Assignee: Swati Jain
>            Priority: Minor
>             Fix For: 0.8.0
> Consider the following:
> {noformat}
> A = load '<any file>' USING PigStorage(',') as (a1:int,a2:int,a3:int);
> B = load '<any file>' USING PigStorage(',') as (b1:int,b2:int,b3:int);
> G = COGROUP A by (a1,a2) , B by (b1,b2);
> D = Filter G by group.$0 + 5 > group.$1;
> explain D;
> {noformat}
> In the above example, LOFilter can be pushed above LOCogroup. Note there are some tricky
NULL issues to think about when the Cogroup is not of type INNER (Similar to issues that need
to be thought through when pushing LOFilter on the right side of a LeftOuterJoin).
> Also note that typically the LOFilter in user programs will be below a ForEach-Cogroup
pair. To make this really useful, we need to also implement LOFilter pushed across ForEach.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message