hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Avner BenHanoch (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service
Date Wed, 05 Sep 2012 09:41:11 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13448598#comment-13448598
] 

Avner BenHanoch commented on MAPREDUCE-4049:
--------------------------------------------

_Hi Asokan,_

 * Everyone agrees that Shuffle should be decoupled from Merge.  However, the coupling occurs
in the implementation of the current _Shuffle.run()_.  I think we clarified that my patch
has nothing to do with that. 
 * It is the current _ReduceTask.run()_ and the current _mapreduce.Reducer.Context_ that uses
_RawKeyValueIterator_ for moving merged records and not only "raw bytes".  My patch didn't
change that.

Fixing the above claims is out of my issue's scope.  

Please, if you have a concrete comment about my design, be specific.  Don't attribute me the
current trunk situation (I don't deserve this big honor :-) ).

Avner

                
> plugin for generic shuffle service
> ----------------------------------
>
>                 Key: MAPREDUCE-4049
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: performance, task, tasktracker
>    Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
>            Reporter: Avner BenHanoch
>              Labels: merge, plugin, rdma, shuffle
>         Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Consumer Plugin TLD.rtf, Hadoop
Shuffle Provider Plugin TLD.rtf, mapred-site.xml, mapreduce-4049.patch, mapreduce-4049.patch
>
>
> Support generic shuffle service as set of two plugins: ShuffleProvider & ShuffleConsumer.
> This will satisfy the following needs:
> # Better shuffle and merge performance. For example: we are working on shuffle plugin
that performs shuffle over RDMA in fast networks (10gE, 40gE, or Infiniband) instead of using
the current HTTP shuffle. Based on the fast RDMA shuffle, the plugin can also utilize a suitable
merge approach during the intermediate merges. Hence, getting much better performance.
> # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden dependency of
NodeManager with a specific version of mapreduce shuffle (currently targeted to 0.24.0).
> References:
> # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu from Auburn
University with others, [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
> # I am attaching 2 documents with suggested Top Level Design for both plugins (currently,
based on 1.0 branch)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message