hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16488) Starting namespace and quota services in master startup asynchronizely
Date Thu, 04 May 2017 18:41:04 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15997200#comment-15997200

Josh Elser commented on HBASE-16488:

@@ -2599,11 +2625,26 @@ public class HMaster extends HRegionServer implements MasterServices,
Server {
   void checkNamespaceManagerReady() throws IOException {
-    if (tableNamespaceManager == null ||
-        !tableNamespaceManager.isTableAvailableAndInitialized(true)) {
+    if (tableNamespaceManager == null) {
       throw new IOException("Table Namespace Manager not ready yet, try again later");
+    } else if (!tableNamespaceManager.isTableAvailableAndInitialized(true)) {
+      try {
+        // Wait some time.
+        long startTime = EnvironmentEdgeManager.currentTime();
+        int timeout = conf.getInt("hbase.master.namespace.waitforready", 30000);
+        while (!tableNamespaceManager.isTableNamespaceManagerStarted() &&
+            EnvironmentEdgeManager.currentTime() - startTime < timeout) {
+          Thread.sleep(100);
+        }
+      } catch (InterruptedException e) {
+        throw (InterruptedIOException) new InterruptedIOException().initCause(e);
+      }
+      if (!tableNamespaceManager.isTableNamespaceManagerStarted()) {
+        throw new IOException("Table Namespace Manager not fully initialized, try again later");
+      }

This sits a little funny with me. Ideally, we'd have the caller do the sleeping so that we're
not blocking a thread inside of the Master (or worse an RPC handler). Your change here is
definitely easier to implement, but I wonder how hard it would be to leave the exception throw
and implement retry logic in the callers (other methods in HMaster or hbase client).

Unrelated: shouldn't {{tableNamespaceManager}} be volatile if we're checking it across different
threads? Or, make it final and use an {{AtomicReference}}?


@@ -93,7 +94,7 @@ public class TableNamespaceManager {
       long startTime = EnvironmentEdgeManager.currentTime();
       int timeout = conf.getInt(NS_INIT_TIMEOUT, DEFAULT_NS_INIT_TIMEOUT);
       while (!isTableAvailableAndInitialized(false)) {
-        if (EnvironmentEdgeManager.currentTime() - startTime + 100 > timeout) {
+        if (EnvironmentEdgeManager.currentTime() - startTime > timeout) {
           // We can't do anything if ns is not online.
           throw new IOException("Timedout " + timeout + "ms waiting for namespace table to
" +
             "be assigned");

Do you know of the reason we were previously augmenting this "runtime" by 100ms?

diff --git hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
index f60be66..c75d4bc 100644
--- hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
+++ hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
@@ -105,6 +105,7 @@ import org.apache.hadoop.hbase.security.HBaseKerberosUtils;
 import org.apache.hadoop.hbase.security.User;
 import org.apache.hadoop.hbase.security.visibility.VisibilityLabelsCache;
 import org.apache.hadoop.hbase.util.Bytes;
+import org.apache.hadoop.hbase.util.EnvironmentEdgeManager;
 import org.apache.hadoop.hbase.util.FSTableDescriptors;
 import org.apache.hadoop.hbase.util.FSUtils;
 import org.apache.hadoop.hbase.util.JVMClusterUtil;
@@ -1459,6 +1460,7 @@ public class HBaseTestingUtility extends HBaseCommonTestingUtility {
+    waitUntilTableNamespaceManagerStarted();
     getHBaseAdmin().createTable(desc, startKey, endKey, numRegions);
     // HBaseAdmin only waits for regions to appear in hbase:meta we should wait until they
are assigned
@@ -1497,6 +1499,7 @@ public class HBaseTestingUtility extends HBaseCommonTestingUtility {
+    waitUntilTableNamespaceManagerStarted();
     getHBaseAdmin().createTable(htd, splitKeys);
     // HBaseAdmin only waits for regions to appear in hbase:meta we should wait until they
     // assigned

Do this once in {{MiniHBaseCluster startMiniHBaseCluster(int, int, Class, Class, boolean,
boolean)}} instead of having it littered across HBaseTestingUtility?

Nice test additions!

> Starting namespace and quota services in master startup asynchronizely
> ----------------------------------------------------------------------
>                 Key: HBASE-16488
>                 URL: https://issues.apache.org/jira/browse/HBASE-16488
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 2.0.0, 1.3.0, 1.0.3, 1.4.0, 1.1.5, 1.2.2
>            Reporter: Stephen Yuan Jiang
>            Assignee: Stephen Yuan Jiang
>         Attachments: HBASE-16488.v1-branch-1.patch, HBASE-16488.v1-master.patch, HBASE-16488.v2-branch-1.patch,
HBASE-16488.v2-branch-1.patch, HBASE-16488.v3-branch-1.patch, HBASE-16488.v3-branch-1.patch,
HBASE-16488.v4-branch-1.patch, HBASE-16488.v5-branch-1.patch, HBASE-16488.v6-branch-1.patch
> From time to time, during internal IT test and from customer, we often see master initialization
failed due to namespace table region takes long time to assign (eg. sometimes split log takes
long time or hanging; or sometimes RS is temporarily not available; sometimes due to some
unknown assignment issue).  In the past, there was some proposal to improve this situation,
eg. HBASE-13556 / HBASE-14190 (Assign system tables ahead of user region assignment) or HBASE-13557
(Special WAL handling for system tables) or  HBASE-14623 (Implement dedicated WAL for system
> This JIRA proposes another way to solve this master initialization fail issue: namespace
service is only used by a handful operations (eg. create table / namespace DDL / get namespace
API / some RS group DDL).  Only quota manager depends on it and quota management is off by
default.  Therefore, namespace service is not really needed for master to be functional. 
So we could start namespace service asynchronizely without blocking master startup.

This message was sent by Atlassian JIRA

View raw message