hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mariappan Asokan (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service
Date Mon, 09 Apr 2012 22:59:19 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13250275#comment-13250275
] 

Mariappan Asokan commented on MAPREDUCE-4049:
---------------------------------------------

Hi Avner,
  I worked on MAPREDUCE-2454(to make sort pluggable in Hadoop) and posted a patch on top of
trunk version 1221902 a while back.  The patch was created on top of the trunk since ReduceTask.java
was already refactored nicely and I was
advised to work on the trunk version.

Please take a look at the patch file mapreduce-2454.patch posted in MAPREDUCE-2454.  If you
want, I can post a patch on top of the latest trunk.

The patch decoupled the merge from shuffle by creating ShuffleRunner and ShuffleCallback interfaces.
 The MergeManager implements the ShuffleCallback and the shuffle itself implements ShuffleRunner
interface.

Since you are making shuffle as pluggable, I notice some overlapping changes. If I can be
of any assistance to reduce the conflict between our patches, please let me know.  Meanwhile,
I will go over the details of your patch and get back. Do you have a patch created on top
of trunk?

Also, I would like to hear opinions from other developers who have shown interest in this
Jira.


                
> plugin for generic shuffle service
> ----------------------------------
>
>                 Key: MAPREDUCE-4049
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: performance, task, tasktracker
>    Affects Versions: 0.23.1, 1.0.1
>            Reporter: Avner BenHanoch
>              Labels: merge, plugin, rdma, shuffle
>         Attachments: HADOOP-1.0.2.patch, HADOOP-1.0.x.patch, Hadoop Shuffle Consumer
Plugin TLD.rtf, Hadoop Shuffle Provider Plugin TLD.rtf, MAPREDUCE-4049-branch-1.0.2.patch,
mapred-site.xml, mapred.diff, src.tgz, test.diff
>
>
> Support generic shuffle service as set of two plugins: ShuffleProvider & ShuffleConsumer.
> This will satisfy the following needs:
> # Better shuffle and merge performance. For example: we are working on shuffle plugin
that performs shuffle over RDMA in fast networks (10gE, 40gE, or Infiniband) instead of using
the current HTTP shuffle. Based on the fast RDMA shuffle, the plugin can also utilize a suitable
merge approach during the intermediate merges. Hence, getting much better performance.
> # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden dependency of
NodeManager with a specific version of mapreduce shuffle (currently targeted to 0.24.0).
> References:
> # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu from Auburn
University with others, [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
> # I am attaching 2 documents with suggested Top Level Design for both plugins (currently,
based on 1.0 branch)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message