hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tsuyoshi OZAWA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4502) Multi-level aggregation with combining the result of maps per node/rack
Date Fri, 14 Sep 2012 14:58:08 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13455840#comment-13455840
] 

Tsuyoshi OZAWA commented on MAPREDUCE-4502:
-------------------------------------------

Chris and Karthik,

Thank you for your sharing your experience and thinking. These are very useful for me.

bq. ShuffleHandler is an auxiliary service loaded in the NodeManager. It's shared across all
containers. 

I see. I have to redesign it to run combiner in container.

bq. Carlo Curino and I experimented with this, but (a) saw only slight improvements in job
performance and (b) the changes to the AM to accommodate a new task type were extensive.

This is very interesting. In fact, I prototyped to run combiner at the end of MapTask as the
first version. And, its performance was good. In this case, I found that it's needed to add
new status to MapTask because of assuring fault tolerance. Is it acceptable for hadoop to
do that?

bq. With logic to manage skew, we're hoping that scheduling an aggressive range can have a
similar effect to combiner tasks, without introducing the new task type.

This seems to be good approach to deal with rack-level aggregation. Do you have some result
to 

bq. 1. Perform node-level aggregation (reduce) at the end of maps in co-ordination with AM.
bq. 2. Perform rack-level aggregation at the end of node-level aggregation again in co-ordination
with AM. The aggregation could be performed in parallel across the involved nodes such that
each node has aggregated values of different keys.
bq. 3. Schedule reducers taking the key-distribution into account across racks.

Nice wrap-up :-)

bq. The con will be that the shuffle won't be asynchronous to map computation, but hopefully
this wouldn't offset the gains of decreased network and disk I/O.

The balance between the gains by asynchronous processing and the one by decreasing network
and disk I/O. In my previous experiment, it deeply depends on number of reducers. I think
these gains are trade-off, so parameters are necessary to deal with various workloads.

bq. PS. http://dl.acm.org/citation.cfm?id=1901088 documents the advantages of multi-level
aggregation in the context of graph algorithms modeled as iterative MR jobs.

I'm going read it :)

It's time for me to create the new revision of design note with reflecting your opinion. 

Thanks,
-- Tsuyoshi
                
> Multi-level aggregation with combining the result of maps per node/rack
> -----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4502
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4502
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: applicationmaster, mrv2
>            Reporter: Tsuyoshi OZAWA
>            Assignee: Tsuyoshi OZAWA
>         Attachments: speculative_draft.pdf
>
>
> The shuffle costs is expensive in Hadoop in spite of the existence of combiner, because
the scope of combining is limited within only one MapTask. To solve this problem, it's a good
way to aggregate the result of maps per node/rack by launch combiner.
> This JIRA is to implement the multi-level aggregation infrastructure, including combining
per container(MAPREDUCE-3902 is related), coordinating containers by application master without
breaking fault tolerance of jobs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message