hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Avner BenHanoch (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service
Date Thu, 03 Jan 2013 16:48:16 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13543056#comment-13543056

Avner BenHanoch commented on MAPREDUCE-4049:

Hi Alejandro,

On #1 - Thanks!

On #2 - YES: 
 1. Since, ShuffleProvider is configured for the lifetime of TT; while, ShuffleConsumer is
configured per job.  We don't want to restart MapReduce/TaskTrackers any time we want to use
different shuffle.

 2. In addition, I expect that for 1 job there will be used just 1 type of shuffle.  *Still,
TT supports multiple jobs of multiple users with different shuffle&merge needs in parallel*.
 Hence, multiple shuffle consumers may run in parallel (in the multiple jobs) => they will
request data from multiple providers.  => *TT needs multiple providers in parallel* (You
can consider multiple ShufleProviders in MRv1 as equivalent to multiple AuxiliaryServices
that are allowed in MRv2).

 3. It could be that a ShuffleConsumerX will be ideal for jobs of one type, while ShuffleConsumerY
will be ideal for jobs of other type (for example Grep vs. TeraSort).  Hence, multiple Shuffle-Consumer
plugins may run in parallel in multiple jobs.  Each of the consumers will contact its desired
shuffle provider.  Hence, all providers should be available in parallel (also, one shuffle
service can be sensitive to type of network problems that doesn't disturb other shuffle services,
hence, it should be possible to fallback to another shuffle on the fly).

on the design:
 1. It is clear that a ShuffleProvider is a daemon like TT, while ShuffleConsumer is a client
that lives in the context of RT
 2. It is clear that multiple providers can run in parallel and each is able to serve shuffle
request it gets.  
 3. A shuffle consumer instance will only contact one of the shuffle providers and will request
its desired files only from from this provider.
 4. multiple consumers in multiple jobs may contact different providers
 5. *A shuffle provider that doesn't serve a request doesn't consume resources for it.*

> plugin for generic shuffle service
> ----------------------------------
>                 Key: MAPREDUCE-4049
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: performance, task, tasktracker
>    Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
>            Reporter: Avner BenHanoch
>            Assignee: Avner BenHanoch
>              Labels: merge, plugin, rdma, shuffle
>             Fix For: 3.0.0
>         Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, mapreduce-4049.patch
> Support generic shuffle service as set of two plugins: ShuffleProvider & ShuffleConsumer.
> This will satisfy the following needs:
> # Better shuffle and merge performance. For example: we are working on shuffle plugin
that performs shuffle over RDMA in fast networks (10gE, 40gE, or Infiniband) instead of using
the current HTTP shuffle. Based on the fast RDMA shuffle, the plugin can also utilize a suitable
merge approach during the intermediate merges. Hence, getting much better performance.
> # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden dependency of
NodeManager with a specific version of mapreduce shuffle (currently targeted to 0.24.0).
> References:
> # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu from Auburn
University with others, [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
> # I am attaching 2 documents with suggested Top Level Design for both plugins (currently,
based on 1.0 branch)
> # I am providing link for downloading UDA - Mellanox's open source plugin that implements
generic shuffle service using RDMA and levitated merge.  Note: At this phase, the code is
in C++ through JNI and you should consider it as beta only.  Still, it can serve anyone that
wants to implement or contribute to levitated merge. (Please be advised that levitated merge
is mostly suit in very fast networks) - [http://www.mellanox.com/content/pages.php?pg=products_dyn&product_family=144&menu_section=69]

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message