hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Gates (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-158) Rework logical plan
Date Mon, 24 Mar 2008 02:01:28 GMT

    [ https://issues.apache.org/jira/browse/PIG-158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12581453#action_12581453
] 

Alan Gates commented on PIG-158:
--------------------------------

Comments:

BinaryExpressionOperator:  When adding classwide javadoc comments you should put them above
the declaration of the class so that javadoc picks them up and associates them with the class.
 

You could move supportsMultipleInputs to Binary and UnaryExpressionOperator as it will be
true for all subclasses.  You couuld move supportsMultipleOutputs to ExpressionOperator since
all expressions only support one output.

In LOSort.getSchema, LOSort should always have one and only one parent, so all the checking
around s.iterator().getNext() isn't necessary.  Just do s.iterator().next(), and then assert
if you get back a null.  Same goes for LODistinct.

LOEval and LOForEach seem to be the same thing.  I think we need either one or the other.

LOMapLookup:  This should be an ExpressionOperator, rather than a direct extender of LogicalOperator.
 Also, it should only take one value, not an array.  For example, there is a query like:

a = load 'myfile' as mymap map;
b = foreach a generate mymap.myfirstkey, mymap.mysecondkey;

then the resulting logical plan should have a LOGenerate operator with two expressions, a
LOMapLookup that has map of mymap and a key of myfirstkey, and a LOMapLookup that has map
of mymap, and key of mysecondkey.

LOCogroup:  I think the group by cols array can be an array of ExpressionOperators.  You could
envision grouping on a transformation of the columns, but not on a relational operator.

I don't think the LOCogroup.getSchema method is correct.  The schema that results from cogroup
will be group, bag1, bag2, ...  Group may or may not be a tuple, depending on how many group
by keys there are.  The other columns are bags with tuples from each of the relations being
grouped.  So if you have 

a = group b by name;

then the resulting schema (assuming name is of type bytearray) is: (bytearray, bag).

If you have 

a = cogroup b by name, c by name;

then the resulting schema is (tuple, bag, bag).

Why did LOUnion get totally commented out?  We still need an LOUnion.  Same goes in LOVisitor.visitUnion().

LOGenerates mProjections array should be an array of ExpressionOperators.

LOVisitory.visitFilter() should not visit the filter's input.  The general graph walking algorithm
in PlanVisitor will handle this.





> Rework logical plan
> -------------------
>
>                 Key: PIG-158
>                 URL: https://issues.apache.org/jira/browse/PIG-158
>             Project: Pig
>          Issue Type: Sub-task
>          Components: impl
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>         Attachments: logical_operators.patch
>
>
> Rework the logical plan in line with http://wiki.apache.org/pig/PigExecutionModel

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message