hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Junping Du (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4676) Automatic and Asynchronous Decommissioning Nodes Status Tracking
Date Sun, 07 Aug 2016 17:47:21 GMT

    [ https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15411012#comment-15411012

Junping Du commented on YARN-4676:

Thanks for updating the patch, [~danzhi]! I went through the whole patch this weekend except
the documentation part. I would suggest to separate document code into a separated JIRA if
we don't want this JIRA lingering on for longer time as reviewing the doc is really a bit
difficult in general. 
My overall feeling for the patch is we are getting closer, but still many issues to address,
see below:
In HostsFileReader.java,
+  public synchronized Map<String, Integer> getExcludedHostsWithTimeout() {
+    return excludes;
+  }
Two issues here: 1. synchronized doesn't work here if other places update excludes (protected
in write lock), we should use read lock to protected. 2. we shouldn't return excludes directly
which could cause leak of writer lock because the caller can potentially update the map here.
Please refer example like getHostDetails() which is more thread safe in this case.

+      // Examples:
+      // <host><name>host1</name></host>
+      // <host><name>host2</name><timeout>123</timeout></host>
+      // <host><name>host3</name><timeout>-1</timeout></host>
Can we allow multiple hosts in the same item if their timeout value is the same, like: "<host><name>host1,
host2, host3</name><timeout>123</timeout></host>"? It sounds much
concisely and can save lots of duplicated meta info. Should be easy to address. Isn't it?

In RMAdminCLI.java,
+  public static void readXmlFileToMapWithFileInputStream(String type,
+      String filename, InputStream fileInputStream, Map<String, Integer> map)
+          throws IOException
I don't see how we throw IOException here given IOException is already catched and throw runtime
exception instead. We should remove if it is not necessary.

+      if ("-g".equals(args[1]) || "-graceful".equals(args[1])) {
If we want to support "-graceful" in addition to -g, then we should mention it in commandline
usage messages also.

In DecommissioningNodesWatcher.java,
+ * DecommissioningNodesWatcher basically is no cost when no node is
+ * DECOMMISSIONING. This sacrifices the possibility that that node once
+ * host containers of an still running application.
 I don't quite understand the last sentense here. Can you have more clarification here?

In {{void update(RMNode rmNode, NodeStatus remoteNodeStatus)}},
+    if (rmNode.getState() == NodeState.DECOMMISSIONED) {
+      if (context == null) {
+        return;
+      }
+      context.nodeState = rmNode.getState();
+      // keep DECOMMISSIONED node for a while for status log.
+      if (context.decommissionedTime == 0) {
+        context.decommissionedTime = now;
+      } else if (now - context.decommissionedTime > 60000L) {
+        decomNodes.remove(rmNode.getNodeID());
+        LOG.info("remove " + rmNode.getState() + " " + rmNode.getNodeID());
+      }
Why we are waiting for 60 seconds to remove node from decomNodes after node in DECOMMISSIONED?
Shouldn't we remove it directly?

+    long waitTime =
+        System.currentTimeMillis() - context.decommissioningStartTime;
We should use MonotonicClock to track all timeout calculations which won't affected by system
clock changes, please refer AbstractLivelinessMonitor as an example.

The logic here need more improvements: 1. it doesn't have any sleep time for monitoring interval
- I don't think we should so tightly loop here; 2. we don't check the NM that updated in latest
30 seconds or 60 seconds which sounds too gross-grained timeout tracking. 3. Many unnecessary
log operations.
I think we should follow similar logic in AbstractLivelinessMonitor although we cannot use
it directly as each node could have different expire interval. Also, we should move the whole
logic to loop staleNodes to check timeout/ready and send DECOMMISSION and a statemachine transition
that can be triggered directly when received a READY or TIMEOUT event.

I will do more tests in next 1 day or 2 to make sure it won't affect forcefully decommission
functionality and client side timeout tracking.

> Automatic and Asynchronous Decommissioning Nodes Status Tracking
> ----------------------------------------------------------------
>                 Key: YARN-4676
>                 URL: https://issues.apache.org/jira/browse/YARN-4676
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>    Affects Versions: 2.8.0
>            Reporter: Daniel Zhi
>            Assignee: Daniel Zhi
>              Labels: features
>         Attachments: GracefulDecommissionYarnNode.pdf, GracefulDecommissionYarnNode.pdf,
YARN-4676.004.patch, YARN-4676.005.patch, YARN-4676.006.patch, YARN-4676.007.patch, YARN-4676.008.patch,
YARN-4676.009.patch, YARN-4676.010.patch, YARN-4676.011.patch, YARN-4676.012.patch, YARN-4676.013.patch,
YARN-4676.014.patch, YARN-4676.015.patch, YARN-4676.016.patch, YARN-4676.017.patch, YARN-4676.018.patch,
YARN-4676.019.patch, YARN-4676.020.patch
> YARN-4676 implements an automatic, asynchronous and flexible mechanism to graceful decommission
> YARN nodes. After user issues the refreshNodes request, ResourceManager automatically
> status of all affected nodes to kicks out decommission or recommission actions. RM asynchronously
> tracks container and application status related to DECOMMISSIONING nodes to decommission
> nodes immediately after there are ready to be decommissioned. Decommissioning timeout
at individual
> nodes granularity is supported and could be dynamically updated. The mechanism naturally
supports multiple
> independent graceful decommissioning “sessions” where each one involves different
sets of nodes with
> different timeout settings. Such support is ideal and necessary for graceful decommission
request issued
> by external cluster management software instead of human.
> DecommissioningNodeWatcher inside ResourceTrackingService tracks DECOMMISSIONING nodes
status automatically and asynchronously after client/admin made the graceful decommission
request. It tracks DECOMMISSIONING nodes status to decide when, after all running containers
on the node have completed, will be transitioned into DECOMMISSIONED state. NodesListManager
detect and handle include and exclude list changes to kick out decommission or recommission
as necessary.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message