hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Gates (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-161) Rework physical plan
Date Mon, 31 Mar 2008 18:30:24 GMT

    [ https://issues.apache.org/jira/browse/PIG-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12583786#action_12583786
] 

Alan Gates commented on PIG-161:
--------------------------------

Shravan, this looks good.  You've put a lot of work into this.  I have a
number of comments below.  Most of them are small, though a few will require a
bit of work.  I'm going to go ahead and commit your changes as I think it will
be easier for you to make the requested changes that way.  It will also make
it easier for others to work with and build on the changes.

Comments:

In general, I removed commented out tests that you had moved to junit.

POFilter:

1.1 Filter's compOperand should always return type Boolean.  The type checker will
assure this.  If in the future we decide to be C like and allow expressions
like: "filter b by $0" implying that $0 is non-null or non-zero or something
than it will be the task of the type checker to put a cast in there to make it
work.

1.2 Line 88, why is it an error for the predecessor to pass a NULL?

1.3 When processing the results from the compOperand, it calls continue if
compOperand returns any status besides OK.  Why?  Shouldn't it pass on errors?
compOperand could also legitimately return NULL, in which case continue is the correct
behavior.

PhysicalOperator:

2.1 Why does the function setInputAttached exist?  In what case do we want to
set this without actually attaching the input (that is, without calling
attachInput)?

2.2 Default implementations of getNext() should return ERR, not null.

GreaterThanExpr:

3.1 Line 49, If it receives a null, there's no need to warn, just pass it on.
If it receives an error or end (which is unexpected in a greater than) it
should return an error rather than just swallow it.  Same applies to other
status checks.

POProject:

4.1 Why is getNext() with no arguments public?  Do you anticipate outsiders
calling it?  It seems like it should be private.

4.2 Line 117, you have res set up to return a NULL (meaning you're just going
to ignore that field) and you're calling log fatal.  In general, log fatal
should indicate that the engine has hit an unrecoverable error and the job
should fail now without retry.  This should be a warning instead.

4.3 getNext(Tuple), there is logic in here to handle the fact that it may be
pulling straight tuples out of the top tuple or it may be pulling tuples out of a
bag that's in the top tuple.  That is, for Project($1), you may have a base
tuple of [x, [a, b, c]] or [x, {[a, b, c], [d, e, f]}], where [] indicates a
tuple and {} a bag.  You should know this up front from the logical plan.  So
I'm wondering if it would be better to set that up front and remember it there
rather than checking types on each call of getNext().

ExprPlanVisitor:

5.1 visitGreaterThan should visit its constituent expressions (ie
visit.getLhs, visit.getRhs).

PhyPlanVisitor:

6.1 visitFilter should visit its comparison operator.

OperatorHelper

7 Should these functions be moved into DataType?

TestFilter

8 You should create a small (ten line?) input set with some tuples that will
return true and some false against the filter, and then run that through the
filter and check that you get only the true tuples back.

TestProject

9 You should add tests for projecting scalar types, a tuple, and a bag of
tuples.




> Rework physical plan
> --------------------
>
>                 Key: PIG-161
>                 URL: https://issues.apache.org/jira/browse/PIG-161
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>         Attachments: Phy_AbsClass.patch
>
>
> This bug tracks work to rework all of the physical operators as described in http://wiki.apache.org/pig/PigTypesFunctionalSpec

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message