hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shravan Matthur Narayanamurthy (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-161) Rework physical plan
Date Tue, 15 Apr 2008 00:13:04 GMT

    [ https://issues.apache.org/jira/browse/PIG-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12588851#action_12588851
] 

Shravan Matthur Narayanamurthy commented on PIG-161:
----------------------------------------------------

Response in line...

Questions on incr4 patch

1) According to the PigExecutionModel document:

The Local Rearrange takes the input tuple and outputs a key, value pair with the group field
as the key and the tuple as the value. For eg., (1,R) will be converted to {1,(1,R)}. Also
the tuple is tagged with the input index it originated from. In our case, if (1,R) came from
A it would be tagged 1 and if it was from B it would be tagged 2.

Yet in LocalRearrange.constructLROutput() you are creating [key, {[index, tuple]}] where []
indicates tuple and {} bag. So you are creating a tuple with the key and a bag, and in that
bag you are putting the indexed tuple. Why the bag? You always have only one element, the
bag seems like it's just baggage.

[shrav] I wanted to maintain uniformity of package whether a Combiner is used or not. But
I don't think it is served here. I was thinking if the combiner outputs a bag, then there
will be two different situations based on whether combiner is used or not. So I put the LR
output also into the bag. I guess it doesn't make much sense to put the List<IndexedTuples>
into a Bag and then again go through the all the contents over multiple bags. Also combiner
doesn't make much sense when it is just a group I guess. Is there a uniform way of handling
this irrespective of whether we use the combiner or not?

2) In POPackage, we want eventually to be able to push eval funcs into the getNext loop of
this. As currently implemented, all of the bags are materialized (that is, copied off the
disk). This is fine for now. But eventually we want to be able to push eval functions into
that loop ...

[shrav] I don't think there is anything that restricts it. I think we can eliminate it even
now. By not having a separate POPackage and making it a part of the reducer, the evaluation
of the reduce plan, which is a plan to be executed on the reduce side, can handle this even
now. If you think we need to do this, I can do it now.



> Rework physical plan
> --------------------
>
>                 Key: PIG-161
>                 URL: https://issues.apache.org/jira/browse/PIG-161
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>         Attachments: arithmeticOperators.patch, incr2.patch, incr3.patch, incr4.patch,
Phy_AbsClass.patch, pogenerate.patch, pogenerate.patch, pogenerate.patch
>
>
> This bug tracks work to rework all of the physical operators as described in http://wiki.apache.org/pig/PigTypesFunctionalSpec

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message