hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ying He (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with
Date Thu, 11 Feb 2010 00:10:28 GMT

    [ https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832313#action_12832313
] 

Ying He commented on PIG-1178:
------------------------------

Here is my thoughts to use this framework to implement PruneColumns.

1. Separate prune columns and prune map keys into 2 rules. Current implementation mixed them
in one class. It's better to separate them to make each rule simpler. 

2. The prune column rule can be implemented by creating a new visitor. This visitor is called
from transform(), and it visits every LogicalRelationalOperator by reverse dependency order.
Each visit(LogicalRelationalOperator) calculates the required output uids  by combining the
input uids from it successors. If a node is the sink of the plan, the output uids are retrieved
from its schema. The input uids are calculated from its output uids by looking into the expression
plan(s) of this operator.  If an output uid is derived from other uids, the source uids should
be put into input uids. For example, a+b is from a & b. The input uids should keep the
uid of a & b.   Each operator should consider its logical meanings when calculating input
uids from output uids. For example, for LOCross, the input uids should contain at least one
field from each input. 

The input uids and output uids can be added into the operator as annotations.

3. After step 2, use another visitor to go over the plan again by dependency order to prune
the columns.  This can be done by reading out the input and output uids for each node.

4. I think it's ok to implement prune column and prune map key as regular rule. They just
need to overwrite the match().

public List<OperatorPlan> match(OperatorPlan plan) {
    List<OperatorPlan> ll = new ArrayList<OperatorPlan();
    ll.add(plan);
    return ll;
}

This method tells optimizer that only one match is find, which is the plan itself.

5. For Transformer class, I suggest to get rid of check() and change void transform() into
 boolean transform().   If transform() returns false, it means no transformation is made.
If it returns true, transformation is made. The reason is that for some rules, it is not easy
to know if a change is going to be made, such as PruneColumn rule.   If we have both check()
and transform(), lots of logic would be duplicated in these two methods.

> LogicalPlan and Optimizer are too complex and hard to work with
> ---------------------------------------------------------------
>
>                 Key: PIG-1178
>                 URL: https://issues.apache.org/jira/browse/PIG-1178
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Alan Gates
>            Assignee: Ying He
>         Attachments: expressions-2.patch, expressions.patch, lp.patch, lp.patch, pig_1178.patch,
pig_1178.patch, PIG_1178.patch
>
>
> The current implementation of the logical plan and the logical optimizer in Pig has proven
to not be easily extensible. Developer feedback has indicated that adding new rules to the
optimizer is quite burdensome. In addition, the logical plan has been an area of numerous
bugs, many of which have been difficult to fix. Developers also feel that the logical plan
is difficult to understand and maintain. The root cause for these issues is that a number
of design decisions that were made as part of the 0.2 rewrite of the front end have now proven
to be sub-optimal. The heart of this proposal is to revisit a number of those proposals and
rebuild the logical plan with a simpler design that will make it much easier to maintain the
logical plan as well as extend the logical optimizer. 
> See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message