pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shravan Matthur Narayanamurthy (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-161) Rework physical plan
Date Tue, 01 Apr 2008 13:56:26 GMT

    [ https://issues.apache.org/jira/browse/PIG-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12584143#action_12584143

Shravan Matthur Narayanamurthy commented on PIG-161:

Thanks for the comments and for checking the patch in Alan. Its definitely easier to modify
existing classes and I won't be the blocker for shubham's patches. :)

Responses in lin...


In general, I removed commented out tests that you had moved to junit.
[shrav] Oh thanks! I forgot to remove them I guess.


1.1 Filter's compOperand should always return type Boolean. The type checker will
assure this. If in the future we decide to be C like and allow expressions
like: "filter b by $0" implying that $0 is non-null or non-zero or something
than it will be the task of the type checker to put a cast in there to make it
[shrav] You are right. But for the getNext calls to cascade we need to know the result type
of the operands of compOp as they can be anything. Hence, instead of using the result type
of the comparison operator, I use the operands' result type. 

1.2 Line 88, why is it an error for the predecessor to pass a NULL?
[shrav] I am not sure I understand this. So are you saying that the NULL returned should be
passed up the pipeline? I thought that we had agreed on not passing NULLs through the pipeline.

1.3 When processing the results from the compOperand, it calls continue if
compOperand returns any status besides OK. Why? Shouldn't it pass on errors?
compOperand could also legitimately return NULL, in which case continue is the correct
[shrav] I have implemented it in a greedy fashion. It tries to avoid passing errors on as
it is happening only inside its scope and tries in hope that the next tuple processed will
not throw an error. If this is wrong, I can change it to pass errors on. I have followed the
same philosphy everywhere.


2.1 Why does the function setInputAttached exist? In what case do we want to
set this without actually attaching the input (that is, without calling
[shrav] Yeah I think it might not be needed. I just put it in along with getters and setters
for other parameters I guess. I will remove that.

2.2 Default implementations of getNext() should return ERR, not null.
[shrav] It does return ERR I thought. It returns a new instance of Result which has the returnStatus
set to ERR.


3.1 Line 49, If it receives a null, there's no need to warn, just pass it on.
If it receives an error or end (which is unexpected in a greater than) it
should return an error rather than just swallow it. Same applies to other
status checks.
[shrav] See my earlier response.


4.1 Why is getNext() with no arguments public? Do you anticipate outsiders
calling it? It seems like it should be private.
[shrav] It was private before. I neede to use it in the unit tests instead of calling getNext
with each data type. I guess I was lazy. :) Should I change this?

4.2 Line 117, you have res set up to return a NULL (meaning you're just going
to ignore that field) and you're calling log fatal. In general, log fatal
should indicate that the engine has hit an unrecoverable error and the job
should fail now without retry. This should be a warning instead.
[shrav] You are right I messed that one up!

4.3 getNext(Tuple), there is logic in here to handle the fact that it may be
pulling straight tuples out of the top tuple or it may be pulling tuples out of a
bag that's in the top tuple. That is, for Project($1), you may have a base
tuple of [x, [a, b, c]] or [x, {[a, b, c], [d, e, f]}], where [] indicates a
tuple and {} a bag. You should know this up front from the logical plan. So
I'm wondering if it would be better to set that up front and remember it there
rather than checking types on each call of getNext().
[shrav]  Good Idea. I will modify this in my next patch.


5.1 visitGreaterThan should visit its constituent expressions (ie
visit.getLhs, visit.getRhs).
[shrav] I guess this was what I was suggesting for the logical operator as well. I maintain
a separate plan for expressions. The ExprPlan consists of all the expression operators in
the greater than expression and ExprPlanVisitor is used only for expressions. So the visit
will automatically visit lhs and rhs and probably another input like for a ternary one as


6.1 visitFilter should visit its comparison operator.
[shrav] I thought this should be left to some concrete implementation of the PhyPlanVisitor.


7 Should these functions be moved into DataType?
[shrav] You are right. It should belong there I guess.


8 You should create a small (ten line?) input set with some tuples that will
return true and some false against the filter, and then run that through the
filter and check that you get only the true tuples back.

[shrav] I just tested it with a static condition which either returns a pass or fail and tested
by passing the tuple. I wasn't sure if the unit test for just the filter needed the full case.
I can add them.

9 You should add tests for projecting scalar types, a tuple, and a bag of
[shrav] The scalar types and direct bag have been covered. I can add a tuple also into the
random tuple thats being generated. So I will just have to add one more test case for bag
of tuples.

> Rework physical plan
> --------------------
>                 Key: PIG-161
>                 URL: https://issues.apache.org/jira/browse/PIG-161
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>         Attachments: Phy_AbsClass.patch
> This bug tracks work to rework all of the physical operators as described in http://wiki.apache.org/pig/PigTypesFunctionalSpec

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message