hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shravan Matthur Narayanamurthy (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-161) Rework physical plan
Date Sat, 03 May 2008 01:00:58 GMT

    [ https://issues.apache.org/jira/browse/PIG-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12593948#action_12593948
] 

Shravan Matthur Narayanamurthy commented on PIG-161:
----------------------------------------------------

    2) Several places in getNext are checking if func is null. The constructor should instead
guarantee that the function has been called and then no checks should be done in getNext or
anything it calls. This code is going to run once for every record processed, so we want to
remove every instruction we can from it.

    Shubham>> Shravan had pointed out earlier that POUserFunc might not be serializable
because of EvalFunc not being serializable. So I had to declare the Object func as transient.
The null checking is to make sure that after deserialization func is instantiated with EvalFunc/ComparisonFunc.

Ok, but I assume that after deserialization on the MR side, func only needs to be instatiated
once, but you are checking for it in a number of places. It needs to only be checked once,
preferably outside of the getNext loop if possible.

[shrav] I guess there are two ways of solving this. One is how shubham has implemented and
the other is to assume that some extraneous source will ensure the appropriate state after
deserialization. If we have to take the other approach, I will have to do it in the configure()
method of the mapper or the reducer. And for this I would also need to maintain a list of
POUserFuncs which need to be configured by a call to the instantiateFunc. It is certainly
possible but I was thinking of pushing it to perf tuning stage as we were already behind schedule.
I will try to fit this in but will get to it only after I finish the other stuff.

Also, I guess with Shubham's new patch which separates the Comparison func, we will see the
check, only at one place, inside the generic getNext() method. 

Do you think this is ok Alan?

> Rework physical plan
> --------------------
>
>                 Key: PIG-161
>                 URL: https://issues.apache.org/jira/browse/PIG-161
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>         Attachments: arithmeticOperators.patch, incr2.patch, incr3.patch, incr4.patch,
incr5.patch, MRCompilerTests_PlansAndOutputs.txt, Phy_AbsClass.patch, physicalOps.patch, physicalOps.patch,
physicalOps.patch, podistinct.patch, pogenerate.patch, pogenerate.patch, pogenerate.patch,
posort.patch
>
>
> This bug tracks work to rework all of the physical operators as described in http://wiki.apache.org/pig/PigTypesFunctionalSpec

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message