hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Junping Du (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-914) Support graceful decommission of nodemanager
Date Fri, 23 Jan 2015 02:52:36 GMT

    [ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288644#comment-14288644
] 

Junping Du commented on YARN-914:
---------------------------------

Sorry for replying late. These are all good points, a couple of comments:

bq. Sounds like we need a new state for NM, called "decommission_in_progress" when NM is draining
the containers.
Agree. We need a dedicated state for NM in this situation and both AM and RM should be aware
of it for properly handle it.  

bq. To clarify my early comment "all its map output are fetched or until all the applications
the node touches have completed", the question is when YARN can declare a node's state has
been gracefully drained and thus the node gracefully decommissioned ( admins can shutdown
the whole machine without any impact on jobs ). For MR, the state could be running tasks/containers
or mapper outputs. Say we have timeout of 30 minutes for decommission, it takes 3 minutes
to finish the mappers on the node, another 5 minutes for the job to finish, then YARN can
declare the node gracefully decommissioned in 8 minutes, instead of waiting for 30 minutes.
RM knows all applications on any given NM. So if all applications on any given node have completed,
RM can mark the node "decommissioned".
The first step I was thinking to keep NM running in a low resource mode after graceful decommissioned
- no running containers, no new containers get spawned, no obviously resources consumption,
etc. and just like putting these nodes into maintenance mode. Timeout value there is used
to kill unfinished containers to release resources. Not quite sure if we have to terminate
NM after timeout but would like to understand your use case here.

bq. Yes, I meant long running services. If YARN just kills the containers upon decommission
request, the impact could vary. Some services might not have states to drain. Or maybe the
services can handle the state migration on their own without YARN's help. For such services,
maybe we can just use ResourceOption's timeout for that; set timeout to 0 and NM will just
kill the containers.
I believe most of these services already take care of losing nodes as each node in YARN cluster
cannot be reliable always. However, I am not sure if they can handle state migration to new
node ahead of predictable node lost here, or be stateless more or less make more sense here?
If we have an example application that could easy migrate a node's state to another, then
we can discuss how to provide some rudimentary support here.   

bq. Given we don't plan to have applications checkpoint and migrate states, it doesn't seem
to be necessary to have YARN notify applications upon decommission requests. Just to call
it out.
These notification may still be necessary, so AM won't add these nodes into blacklist if container
get killed afterwards. Thoughts?

bq. It might be useful to have a new state called "decommissioned_timeout", so that admins
know the node has been gracefully decommissioned or not.
Just like my above comments, we can see if we have to terminate the NM. If not, I prefer to
use "maintenance" state and Admin can decide if to fully decommission it later. Again, we
should talk on your scenarios here. 

> Support graceful decommission of nodemanager
> --------------------------------------------
>
>                 Key: YARN-914
>                 URL: https://issues.apache.org/jira/browse/YARN-914
>             Project: Hadoop YARN
>          Issue Type: Improvement
>    Affects Versions: 2.0.4-alpha
>            Reporter: Luke Lu
>            Assignee: Junping Du
>
> When NMs are decommissioned for non-fault reasons (capacity change etc.), it's desirable
to minimize the impact to running applications.
> Currently if a NM is decommissioned, all running containers on the NM need to be rescheduled
on other NMs. Further more, for finished map tasks, if their map output are not fetched by
the reducers of the job, these map tasks will need to be rerun as well.
> We propose to introduce a mechanism to optionally gracefully decommission a node manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message