hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Junping Du (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4676) Automatic and Asynchronous Decommissioning Nodes Status Tracking
Date Thu, 18 Aug 2016 14:50:21 GMT

    [ https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15426567#comment-15426567
] 

Junping Du commented on YARN-4676:
----------------------------------

Forget to mention, thanks [~rkanter], [~vvasudev] and [~mingma] for review and comments and
[~kasha@cloudera.com] for many offline discussions!
There are still several works left after this patch get in:
1. We need to make sure timeout get persistent during RM fail over/restart. [~rkanter] filed
YARN-5464 and he will work on this.
2. As mentioned by [~mingma] above, we need to support JSON format which consistent with DN
decommission. Just filed YARN-5536 to address this issue.
3. YARN-5311 get filed before for documentation effort of graceful decommission. [~danzhi],
given you already have a patch for review, do you want to continue to work on that? If so,
please feel free to reassign that JIRA to you.

> Automatic and Asynchronous Decommissioning Nodes Status Tracking
> ----------------------------------------------------------------
>
>                 Key: YARN-4676
>                 URL: https://issues.apache.org/jira/browse/YARN-4676
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>    Affects Versions: 2.8.0
>            Reporter: Daniel Zhi
>            Assignee: Daniel Zhi
>              Labels: features
>             Fix For: 2.9.0
>
>         Attachments: GracefulDecommissionYarnNode.pdf, GracefulDecommissionYarnNode.pdf,
YARN-4676.004.patch, YARN-4676.005.patch, YARN-4676.006.patch, YARN-4676.007.patch, YARN-4676.008.patch,
YARN-4676.009.patch, YARN-4676.010.patch, YARN-4676.011.patch, YARN-4676.012.patch, YARN-4676.013.patch,
YARN-4676.014.patch, YARN-4676.015.patch, YARN-4676.016.patch, YARN-4676.017.patch, YARN-4676.018.patch,
YARN-4676.019.patch, YARN-4676.020.patch, YARN-4676.021.patch, YARN-4676.022.patch, YARN-4676.023.patch,
YARN-4676.024.patch
>
>
> YARN-4676 implements an automatic, asynchronous and flexible mechanism to graceful decommission
> YARN nodes. After user issues the refreshNodes request, ResourceManager automatically
evaluates
> status of all affected nodes to kicks out decommission or recommission actions. RM asynchronously
> tracks container and application status related to DECOMMISSIONING nodes to decommission
the
> nodes immediately after there are ready to be decommissioned. Decommissioning timeout
at individual
> nodes granularity is supported and could be dynamically updated. The mechanism naturally
supports multiple
> independent graceful decommissioning “sessions” where each one involves different
sets of nodes with
> different timeout settings. Such support is ideal and necessary for graceful decommission
request issued
> by external cluster management software instead of human.
> DecommissioningNodeWatcher inside ResourceTrackingService tracks DECOMMISSIONING nodes
status automatically and asynchronously after client/admin made the graceful decommission
request. It tracks DECOMMISSIONING nodes status to decide when, after all running containers
on the node have completed, will be transitioned into DECOMMISSIONED state. NodesListManager
detect and handle include and exclude list changes to kick out decommission or recommission
as necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message