hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Haohui Mai (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-9129) Move the safemode block count into BlockManager
Date Wed, 21 Oct 2015 20:59:28 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14967881#comment-14967881
] 

Haohui Mai edited comment on HDFS-9129 at 10/21/15 8:59 PM:
------------------------------------------------------------

Thanks for the work, [~liuml07]

{code}
+    if (status == BMSafeModeStatus.OFF) {
+      return;
+    }

+    if (status == BMSafeModeStatus.OFF) {
+      return;
+    }
{code}

There are multiple cases like the above. They should be asserts.

{code}
+  synchronized boolean isSafeModeTrackingBlocks() {
+    return namesystem.isHaEnabled() && shouldIncrementallyTrackBlocks;
+  }
{code}

The code can be simplified. {{namesystem.isHaEnabled()}} does not change in the lifecycle
of the process.

{code}
+  /**
+   * The transition trigger of the safe mode state machine.
+   *
+   * The state machine elaborates itself in code. Specially the start status is
+   * always INITIALIZED and the end status is OFF. There is no transition to
+   * INITIALIZED and no transition from OFF.
+   */
{code}

It's better to document the conditions of state transition.

{code}
+        needExtension = extension > 0 &&
+            (blockThreshold > 0 || datanodeThreshold > 0);
{code}

This can be moved under the THRESHOLD statement and become a local variable.

{code}
+      if (areThresholdsMet()) {
+        if (needExtension) {
+          // THRESHOLD -> EXTENSION
+          status = BMSafeModeStatus.EXTENSION;
+          reached = monotonicNow();
+          NameNode.stateChangeLog.info("STATE* Safe mode extension entered. \n"
+              + getSafeModeTip());
+          smmthread = new Daemon(new SafeModeMonitor());
+          smmthread.start();
+        } else {
+          // THRESHOLD -> OFF
+          leaveSafeMode();
+        }
+      }
+      initializeReplQueuesIfNecessary();
+      break;
+    case EXTENSION:
+      if (areThresholdsMet() && monotonicNow() - reached >= extension) {
+        // EXTENSION -> OFF
+        leaveSafeMode();
+      }
+      break;
{code}

{{initializeReplQueuesIfNecessary()}} should be called only once.

{code}
+      blockManager.leaveSafeMode();
+      safeModeStatus = SafeModeStatus.OFF;
+      startSecretManagerIfNecessary();
{code}

{{safeModeStatus = SafeModeStatus.OFF}} should be moved to {{BlockManagerSafeMode}}.

{code}
+  private class SafeModeMonitor implements Runnable {
+    /** interval in msec for checking safe mode: {@value}. */
+    private static final long RECHECK_INTERVAL = 1000;
+
+    @Override
+    public void run() {
+      while (namesystem.isRunning()) {
+        try {
+          namesystem.writeLock();
+          if (status == BMSafeModeStatus.OFF) {
+            // We're not in start up safe mode any more while monitoring.
+            // Chances are we left safe mode manually before thresholds and
+            // extension are reached.
+            break;
+          }
+          checkSafeMode();
+          if (!isInSafeMode()) {
+            smmthread = null;
+            break;
+          }
+        } finally {
+          namesystem.writeUnlock();
+        }
+
+        try {
+          Thread.sleep(RECHECK_INTERVAL);
+        } catch (InterruptedException ignored) {
+        }
+      }
+
+      if (!namesystem.isRunning()) {
+        LOG.info("NameNode is being shutdown, exit SafeModeMonitor thread");
+      }
+    }
+  }
{code}
A cleaner approach is to put the reached timestamp into the constructor of {{SafeModeMonitor()}}.

It might be good to have additional unit tests for BlockManagerSafeMode.



was (Author: wheat9):
Thanks for the work, [~liuml07]

{code}
+    if (status == BMSafeModeStatus.OFF) {
+      return;
+    }

+    if (status == BMSafeModeStatus.OFF) {
+      return;
+    }
{code}

There are multiple cases like the above. They should be asserts.

{code}
+  synchronized boolean isSafeModeTrackingBlocks() {
+    return namesystem.isHaEnabled() && shouldIncrementallyTrackBlocks;
+  }
{code}

The code can be simplified. {{namesystem.isHaEnabled()}} does not change in the lifecycle
of the process.

{code}
+  /**
+   * The transition trigger of the safe mode state machine.
+   *
+   * The state machine elaborates itself in code. Specially the start status is
+   * always INITIALIZED and the end status is OFF. There is no transition to
+   * INITIALIZED and no transition from OFF.
+   */
{code}

It's better to document the conditions of state transition.

{code}
+        needExtension = extension > 0 &&
+            (blockThreshold > 0 || datanodeThreshold > 0);
{code}

This can be moved under the THRESHOLD statement and become a local variable.

{code}
+      if (areThresholdsMet()) {
+        if (needExtension) {
+          // THRESHOLD -> EXTENSION
+          status = BMSafeModeStatus.EXTENSION;
+          reached = monotonicNow();
+          NameNode.stateChangeLog.info("STATE* Safe mode extension entered. \n"
+              + getSafeModeTip());
+          smmthread = new Daemon(new SafeModeMonitor());
+          smmthread.start();
+        } else {
+          // THRESHOLD -> OFF
+          leaveSafeMode();
+        }
+      }
+      initializeReplQueuesIfNecessary();
+      break;
+    case EXTENSION:
+      if (areThresholdsMet() && monotonicNow() - reached >= extension) {
+        // EXTENSION -> OFF
+        leaveSafeMode();
+      }
+      break;
{code}

{{initializeReplQueuesIfNecessary()}} should be called only once.

{code}
+      blockManager.leaveSafeMode();
+      safeModeStatus = SafeModeStatus.OFF;
+      startSecretManagerIfNecessary();
{code}

{{safeModeStatus = SafeModeStatus.OFF}} should be moved to {{BlockManagerSafeMode}}.

{code}
+  private class SafeModeMonitor implements Runnable {
+    /** interval in msec for checking safe mode: {@value}. */
+    private static final long RECHECK_INTERVAL = 1000;
+
+    @Override
+    public void run() {
+      while (namesystem.isRunning()) {
+        try {
+          namesystem.writeLock();
+          if (status == BMSafeModeStatus.OFF) {
+            // We're not in start up safe mode any more while monitoring.
+            // Chances are we left safe mode manually before thresholds and
+            // extension are reached.
+            break;
+          }
+          checkSafeMode();
+          if (!isInSafeMode()) {
+            smmthread = null;
+            break;
+          }
+        } finally {
+          namesystem.writeUnlock();
+        }
+
+        try {
+          Thread.sleep(RECHECK_INTERVAL);
+        } catch (InterruptedException ignored) {
+        }
+      }
+
+      if (!namesystem.isRunning()) {
+        LOG.info("NameNode is being shutdown, exit SafeModeMonitor thread");
+      }
+    }
+  }
{code}
A cleaner approach is to put the reached timestamp into the constructor of {{SafeModeMonitor()}},
and there is no need to hold the namesystem lock anymore.


> Move the safemode block count into BlockManager
> -----------------------------------------------
>
>                 Key: HDFS-9129
>                 URL: https://issues.apache.org/jira/browse/HDFS-9129
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Haohui Mai
>            Assignee: Mingliang Liu
>         Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, HDFS-9129.002.patch, HDFS-9129.003.patch,
HDFS-9129.004.patch, HDFS-9129.005.patch, HDFS-9129.006.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the NN can get
out of the safemode. These fields can moved to the {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message