hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Parvez (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-914) (Umbrella) Support graceful decommission of nodemanager
Date Thu, 17 Sep 2015 17:04:07 GMT

    [ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14803230#comment-14803230
] 

Parvez commented on YARN-914:
-----------------------------

Hi,

I am facing issues when trying to resize the AWS EMR cluster which is configured with Hadoop
2.6.0

Resizing works fine, but when decommissioning a node which has containers running in it, the
entire emr cluster stops functioning. On a resize request, the EMR terminates a Task Node
(EC2 instance ) randomly, without checking if it has containers running in it or not. 

Here YARN should perform moving the containers and the job from one node to another, which
it isnt doing I suppose .

Could it be related to the issue listed here ? 

Please answer. Thank you. 

> (Umbrella) Support graceful decommission of nodemanager
> -------------------------------------------------------
>
>                 Key: YARN-914
>                 URL: https://issues.apache.org/jira/browse/YARN-914
>             Project: Hadoop YARN
>          Issue Type: Improvement
>    Affects Versions: 2.0.4-alpha
>            Reporter: Luke Lu
>            Assignee: Junping Du
>         Attachments: Gracefully Decommission of NodeManager (v1).pdf, Gracefully Decommission
of NodeManager (v2).pdf, GracefullyDecommissionofNodeManagerv3.pdf
>
>
> When NMs are decommissioned for non-fault reasons (capacity change etc.), it's desirable
to minimize the impact to running applications.
> Currently if a NM is decommissioned, all running containers on the NM need to be rescheduled
on other NMs. Further more, for finished map tasks, if their map output are not fetched by
the reducers of the job, these map tasks will need to be rerun as well.
> We propose to introduce a mechanism to optionally gracefully decommission a node manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message