hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5160) Hadoop reduce scheduler sometimes leaves machines idle
Date Fri, 06 Feb 2009 08:41:59 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671054#action_12671054
] 

Arun C Murthy commented on HADOOP-5160:
---------------------------------------

As of hadoop-0.18 the Map-Reduce scheduler does assign only 1 reducer per heartbeat and has
the necessary smarts to ensure that it correctly loads up each machine upto ceil(loadfactor)
on each heartbeat. I suspect that ceil(loadfactor) causes some to get overloaded... which
is an unfortunate side-effect which is hard to fix. I'm assuming you don't want to reduce
#reduceslots to 1 per box?

> Hadoop reduce scheduler sometimes leaves machines idle
> ------------------------------------------------------
>
>                 Key: HADOOP-5160
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5160
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Nathan Marz
>
> I have a MapReduce application with number of reducers equal to the number of machines
in the cluster (and with speculative execution turned off). However, Hadoop schedules multiple
reduces to run on single machines and leaves other machines idle. This causes contention and
seriously slows down the job. Hadoop should employ the simple heuristic of utilizing as many
machines as possible when scheduling reduces.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message