hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wangda Tan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.
Date Fri, 14 Aug 2015 17:17:48 GMT

    [ https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697411#comment-14697411
] 

Wangda Tan commented on YARN-1680:
----------------------------------

[~kasha], sorry I don't have a chance to take this, unassigning myself.

I suggest we can finish MAPREDUCE-6302 (I think approach of MAPREDUCE-6302 looks good to me)
to resolve such deadlock issues. AvailableResource calculation can be improved after that.

Thoughts?

> availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes
free memory.
> ------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-1680
>                 URL: https://issues.apache.org/jira/browse/YARN-1680
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: capacityscheduler
>    Affects Versions: 2.2.0, 2.3.0
>         Environment: SuSE 11 SP2 + Hadoop-2.3 
>            Reporter: Rohith Sharma K S
>            Assignee: Tan, Wangda
>         Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, YARN-1680-v2.patch, YARN-1680.patch
>
>
> There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster slow start
is set to 1.
> Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is become
unstable(3 Map got killed), MRAppMaster blacklisted unstable NodeManager(NM-4). All reducer
task are running in cluster now.
> MRAppMaster does not preempt the reducers because for Reducer preemption calculation,
headRoom is considering blacklisted nodes memory. This makes jobs to hang forever(ResourceManager
does not assing any new containers on blacklisted nodes but returns availableResouce considers
cluster free memory). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message