hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shravan Matthur Narayanamurthy (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-162) Rework mapreduce submission and monitoring
Date Fri, 28 Mar 2008 23:14:24 GMT

    [ https://issues.apache.org/jira/browse/PIG-162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12583232#action_12583232
] 

Shravan Matthur Narayanamurthy commented on PIG-162:
----------------------------------------------------

What would be the best way for implementing the Split operator. The problem with implementing
it as an operator would be the buffering required. Since we are following the single threaded
model, a blocking getNext by say a filter operator might actualy read all the tuples from
the split which can very well be in the reduce side. Since the other branch of the split will
execute after the filter, there is no other go but to buffer all the tuples.

One way would be to replicate the pipeline during the logical to physical translation.

Another would be to construct a databag explicitly inside the Split and store all tuples from
its input into the bag. Now attach the bag's iterator to the splt readers. But this doesn't
sound very efficient to me.

Another one would be to differentiate the split processing in map and reduce phases. In the
map side, we can follow the above approach of using a bag since the amount of data is restricted.
On the reuce side, since we will have only one package, we can use plan folding. That is,
make the plan that the split operator feeds to an attribute plan of the split. getNext() to
split wil read a tuple and attach it to the attribute plan and will return whatever, the plan's
root operator's getNext returns. The folded plan can be implemented as in the Map side.

Any suggestions?

> Rework mapreduce submission and monitoring
> ------------------------------------------
>
>                 Key: PIG-162
>                 URL: https://issues.apache.org/jira/browse/PIG-162
>             Project: Pig
>          Issue Type: Sub-task
>         Environment: This bug tracks works to rework the submission and monitoring interface
to map reduce as described in  http://wiki.apache.org/pig/PigTypesFunctionalSpec
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message