hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5928) Deadlock allocating containers for mappers and reducers
Date Mon, 16 Jun 2014 14:58:02 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14032503#comment-14032503
] 

Jason Lowe commented on MAPREDUCE-5928:
---------------------------------------

I'm wondering if the fact that the nodemanger memory has a fractional remainder when it's
"full" triggers the issue.  With tasks all being 512MB that means each node will have 152MB
remaining.  I'm guessing that with enough nodes those remainders will add up to appear to
be enough space to run another task but in reality that task cannot be scheduled since the
memory being reported is fragmented.

> Deadlock allocating containers for mappers and reducers
> -------------------------------------------------------
>
>                 Key: MAPREDUCE-5928
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5928
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>         Environment: Hadoop 2.4.0 (as packaged by HortonWorks in HDP 2.1.2)
>            Reporter: Niels Basjes
>         Attachments: Cluster fully loaded.png.jpg, MR job stuck in deadlock.png.jpg
>
>
> I have a small cluster consisting of 8 desktop class systems (1 master + 7 workers).
> Due to the small memory of these systems I configured yarn as follows:
> {quote}
> yarn.nodemanager.resource.memory-mb = 2200
> yarn.scheduler.minimum-allocation-mb = 250
> {quote}
> On my client I did
> {quote}
> mapreduce.map.memory.mb = 512
> mapreduce.reduce.memory.mb = 512
> {quote}
> Now I run a job with 27 mappers and 32 reducers.
> After a while I saw this deadlock occur:
> -	All nodes had been filled to their maximum capacity with reducers.
> -	1 Mapper was waiting for a container slot to start in.
> I tried killing reducer attempts but that didn't help (new reducer attempts simply took
the existing container).
> *Workaround*:
> I set this value from my job. The default value is 0.05 (= 5%)
> {quote}
> mapreduce.job.reduce.slowstart.completedmaps = 0.99f
> {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message