hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alejandro Abdelnur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3702) add support for chaining Maps in a single Map and after a Reduce [M*/RM*]
Date Mon, 14 Jul 2008 02:53:31 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12613227#action_12613227
] 

Alejandro Abdelnur commented on HADOOP-3702:
--------------------------------------------

* On Chris' comment:

You are right, I've missed that point in my proposed implementation. 

To address that the Chain code should take care of cloning the key and value before passing
them to the following Map in the chain.

Still, as optimization (to avoid serializing/deserializing keys and values) for every link
in the chain a {{passByReference}} property could be set.

* On Runping's comment:

In our current implementation we are doing as you are suggesting, this means we have our own
set of interfaces for processing, not Mappers and Reducers, and we have a ChainMapper and
a ChainReducer that manage the lifecycle of our private interfaces.

Using Mapper/Reducer interfaces directly is cleaner, more consistent and simpler for developers.


> add support for chaining Maps in a single Map and after a Reduce [M*/RM*]
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-3702
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3702
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>            Priority: Minor
>         Attachments: patch3702.txt
>
>
> On the same input, we usually need to run multiple Maps one after the other without no
Reduce. We also have to run multiple Maps after the Reduce.
> If all pre-Reduce Maps are chained together and run as a single Map a significant amount
of Disk I/O will be avoided. 
> Similarly all post-Reduce Maps can be chained together and run in the Reduce phase after
the Reduce.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message