hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Zhi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4676) Automatic and Asynchronous Decommissioning Nodes Status Tracking
Date Wed, 13 Apr 2016 02:30:25 GMT

    [ https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15238478#comment-15238478

Daniel Zhi commented on YARN-4676:

1. I don't expect it will disappear by next patch but will focus on other issues first.
2. I will revert these two files (I didn't notice them due to my local diff tool skipped empty
3. I will restore the resolve() (it was due to my manual merge).
4. Yes it will simplify the code.
5. refreshNodes(long timeout) basically remains unchanged. The client enforces a timeout which
is not fully integrated with the automatic logic in RM side (NodesListManager uses the internal
default timeout (3600 seconds)). Given the code checks status every second, it was likely
expect a smaller timeout from command line. So the command line timeout experience would be
same as before. A deeper integration is to pass the timeout through RefreshNodesRequest to
NodesListManager to honor it. The client-side wait-and-check can still be there but no need
to FORCEFUL decommission as it supposes to happen automatically.
6. I am surprised that update() no longer throw exception (maybe the code evolved since original
version). So I will remove updateNoThrow() (and will log full exception in readDecommissioningTimeout).
7. I will add synchronized. It will be called by every node during every heartbeat. But the
implementation is efficient enough to not have contention due to synchronized. 
8. Is there a list on what "docs" include?

> Automatic and Asynchronous Decommissioning Nodes Status Tracking
> ----------------------------------------------------------------
>                 Key: YARN-4676
>                 URL: https://issues.apache.org/jira/browse/YARN-4676
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>    Affects Versions: 2.8.0
>            Reporter: Daniel Zhi
>            Assignee: Daniel Zhi
>              Labels: features
>         Attachments: GracefulDecommissionYarnNode.pdf, YARN-4676.004.patch, YARN-4676.005.patch,
YARN-4676.006.patch, YARN-4676.007.patch, YARN-4676.008.patch, YARN-4676.009.patch
> DecommissioningNodeWatcher inside ResourceTrackingService tracks DECOMMISSIONING nodes
status automatically and asynchronously after client/admin made the graceful decommission
request. It tracks DECOMMISSIONING nodes status to decide when, after all running containers
on the node have completed, will be transitioned into DECOMMISSIONED state. NodesListManager
detect and handle include and exclude list changes to kick out decommission or recommission
as necessary.

This message was sent by Atlassian JIRA

View raw message