hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Avner BenHanoch (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service
Date Tue, 04 Sep 2012 09:31:07 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13447581#comment-13447581

Avner BenHanoch commented on MAPREDUCE-4049:

My design has no conflict with your design. Below is a comment *I wrote you 4 months ago*:
_Your patch for the trunk is good enough for my needs. I can write my RDMA shuffle plugin
based on either your patch or based on my patch. Hence, I am not planning to submit additional
patch for the trunk on top of your patch. (I will only submit patch for 1.x)_
(I have only now submitted a patch for the trunk, because of Arun's/Todd's comment on my 1.x

The academic paper I pointed as *Reference* should not be confused with my plugin (Personally,
I consider code in academic researches as POC and not as product).   The two relevant conclusions
I take from this academic research are:
  1) Hadoop can benefit from RDMA shuffle and shuffle plugin-ability
  2) With fast shuffle, Hadoop can benefit from *additional* merge algorithms that are not
practical with slow shuffle.  
That's all!  There is no request for Hadoop to keep its coupling of shuffle with merge.  
Again, I encourage your decoupling!  When your patch will be accepted to the trunk, I will
adjust future versions of my plugin following your decoupling.

*My design should not disturb you in any way!*
When reviewing my design from ReduceTask.java point of view, *If you merely rename: ShuffleConsumerPlugin
-> ReduceFeederPlugin, then you could easily see that your decoupling design can peacefully
come after my design.*
I believe the thing that disturbs you is that currently Hadoop uses 'shuffle' which invokes
'merge' while you want the opposite direction.  However, this is outside the scope of my patch.
 Hence, you are welcome to build your patch on top of mine.  It is not really different than
building your patch on top of the current trunk.
I will be more than happy to assist you with anything you might need, and I'll appreciate
it if you gave me your blessing for my commit :-)

> plugin for generic shuffle service
> ----------------------------------
>                 Key: MAPREDUCE-4049
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: performance, task, tasktracker
>    Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
>            Reporter: Avner BenHanoch
>              Labels: merge, plugin, rdma, shuffle
>         Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Consumer Plugin TLD.rtf, Hadoop
Shuffle Provider Plugin TLD.rtf, mapred-site.xml, mapreduce-4049.patch, mapreduce-4049.patch
> Support generic shuffle service as set of two plugins: ShuffleProvider & ShuffleConsumer.
> This will satisfy the following needs:
> # Better shuffle and merge performance. For example: we are working on shuffle plugin
that performs shuffle over RDMA in fast networks (10gE, 40gE, or Infiniband) instead of using
the current HTTP shuffle. Based on the fast RDMA shuffle, the plugin can also utilize a suitable
merge approach during the intermediate merges. Hence, getting much better performance.
> # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden dependency of
NodeManager with a specific version of mapreduce shuffle (currently targeted to 0.24.0).
> References:
> # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu from Auburn
University with others, [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
> # I am attaching 2 documents with suggested Top Level Design for both plugins (currently,
based on 1.0 branch)

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message