hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-2168) We should implement limits on shuffle connections to TaskTracker per job
Date Mon, 01 Nov 2010 15:57:24 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927018#action_12927018

Owen O'Malley commented on MAPREDUCE-2168:

There is already a limit in the code, so it shouldn't be doing that unless you increased the

Also be aware that the shuffle was completely re-written a year ago, so the version in 0.21
and trunk is very different from the version you are running.

> We should  implement limits on shuffle connections to TaskTracker per job
> -------------------------------------------------------------------------
>                 Key: MAPREDUCE-2168
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2168
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Liyin Liang
> As trailing map tasks will be attacked by all reduces simultaneously, all the worker
threads that for the http server of a TaskTracker may be occupied  by one job's reduce tasks
to fetch map outputs. Then this tasktracker's iowait and load will be very high (100+ in our
cluster, we set tasktracker.http.threads with 100). What's more, other job's reduces have
to wait some time (may be several minutes) to connect to the TaskTracker to fetch there map's
> So I think we should implement limits on shuffle connections:
> 1. limit the worker threads' number maybe percent  occupied  the same job's reduces ;
> 2. limit the worker threads' number serving the same map output simultaneously.
> Thoughts? 
> ps: we are using hadoop 0.19.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message