hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Avner BenHanoch (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service
Date Mon, 16 Jul 2012 16:09:37 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13415340#comment-13415340

Avner BenHanoch commented on MAPREDUCE-4049:

Hi Arun,
With respect to your comments, please notice the following:

*regarding "MapOutputServlet to implement ShuffleProviderPlugin"*
1. I am doing that; however, please notice that at this moment, MapOutputServlet will not
be a plugin.  Please let me know if you also want me to make it a plugin.
2. making ShuffleProviderPlugin an interface (instead of abstract class as I wrote it) will
requires moving parts of its code to TaskTracker.  (very few things)
3. I have no problem to drop jobDone/taskDone from the interface.  My core plugin works without
it. (it only served me as hints for cache optimization in experimental variant of the plugin)

*We did some due-diligence on trunk:*
4. ShuffleConsumerPlugin - the interface will be implemented by the Shuffle class.  Main change
is 'run' method instead of fetchOutputs & createKVIterator methods.  In Shuffle class,
I moved CTOR work to an 'init' method. 
5. ShuffleProviderPlugin is not needed in the trunk at all.  The plugin will be loaded as
an AuxiliaryService.  We already tried that.

*unit tests*
6. I'll add unit test(s) for both interfaces
7. this may delay the patch in few days since I never used JUnit (I mainly program in C/C++).
 I am glad to learn it now.  Hence, I may submit patch without tests this week, and few days
after that patch with tests.
8. I understand that I should mention my new test(s) in the mapred section of smoke-tests
file.  Right?

Thank you for working with me on that,
> plugin for generic shuffle service
> ----------------------------------
>                 Key: MAPREDUCE-4049
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: performance, task, tasktracker
>    Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
>            Reporter: Avner BenHanoch
>              Labels: merge, plugin, rdma, shuffle
>         Attachments: HADOOP-1.0.2.patch, HADOOP-1.0.x.patch, HADOOP-1.1.patch, HADOOP-1.x.y-review-oriented.patch,
Hadoop Shuffle Consumer Plugin TLD.rtf, Hadoop Shuffle Provider Plugin TLD.rtf, mapred-site.xml
> Support generic shuffle service as set of two plugins: ShuffleProvider & ShuffleConsumer.
> This will satisfy the following needs:
> # Better shuffle and merge performance. For example: we are working on shuffle plugin
that performs shuffle over RDMA in fast networks (10gE, 40gE, or Infiniband) instead of using
the current HTTP shuffle. Based on the fast RDMA shuffle, the plugin can also utilize a suitable
merge approach during the intermediate merges. Hence, getting much better performance.
> # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden dependency of
NodeManager with a specific version of mapreduce shuffle (currently targeted to 0.24.0).
> References:
> # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu from Auburn
University with others, [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
> # I am attaching 2 documents with suggested Top Level Design for both plugins (currently,
based on 1.0 branch)

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message