hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephen Yuan Jiang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16488) Starting namespace and quota services in master startup asynchronizely
Date Tue, 18 Jul 2017 13:51:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16091568#comment-16091568

Stephen Yuan Jiang commented on HBASE-16488:

V10 patch in branch-1 is approved by [~enis].  

Most tests are passed in pre-commit.  In failed UT, I checked the source code and don't think
they are related to this change.  I re-run those tests locally, and all except one passed.

The only test that fails consistently in my local machine is {{org.apache.hadoop.hbase.regionserver.TestRSKilledWhenInitializing.testRSTerminationAfterRegisteringToMasterBeforeCreatingEphemeralNode}}
- I spent some time to debug it and don't think this is related to this change.  The test
kills one RS and assert that server manager thinks this RS is not online.   Without any change,
the test passed in my local machine consistently.  I added some logging in the test (just
some LOG.info statements inside the test, no other changes) and see what is going on, it would
fail consistently that server manager thinks RS is still online.  If I add some waiting before
assert, the test would pass with about 600ms wait in my local machine.  This is with only
log info messages in test and no real change.  Seems there is a delay between "mini cluster
get live server thinks the RS is dead" and "master server manager remove the RS from the online
server list".  With the patch, the same is true, with about 600ms delay (has nothing to do
with namespace), the test passed.  I think this is test issue and if it consistently repro
in pre-commit.  I will fix the test in a separate JIRA.

> Starting namespace and quota services in master startup asynchronizely
> ----------------------------------------------------------------------
>                 Key: HBASE-16488
>                 URL: https://issues.apache.org/jira/browse/HBASE-16488
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 2.0.0, 1.3.0, 1.0.3, 1.4.0, 1.1.5, 1.2.2
>            Reporter: Stephen Yuan Jiang
>            Assignee: Stephen Yuan Jiang
>         Attachments: HBASE-16488.v10-branch-1.patch, HBASE-16488.v1-branch-1.patch, HBASE-16488.v1-master.patch,
HBASE-16488.v2-branch-1.patch, HBASE-16488.v2-branch-1.patch, HBASE-16488.v3-branch-1.patch,
HBASE-16488.v3-branch-1.patch, HBASE-16488.v4-branch-1.patch, HBASE-16488.v5-branch-1.patch,
HBASE-16488.v6-branch-1.patch, HBASE-16488.v7-branch-1.patch, HBASE-16488.v8-branch-1.patch,
> From time to time, during internal IT test and from customer, we often see master initialization
failed due to namespace table region takes long time to assign (eg. sometimes split log takes
long time or hanging; or sometimes RS is temporarily not available; sometimes due to some
unknown assignment issue).  In the past, there was some proposal to improve this situation,
eg. HBASE-13556 / HBASE-14190 (Assign system tables ahead of user region assignment) or HBASE-13557
(Special WAL handling for system tables) or  HBASE-14623 (Implement dedicated WAL for system
> This JIRA proposes another way to solve this master initialization fail issue: namespace
service is only used by a handful operations (eg. create table / namespace DDL / get namespace
API / some RS group DDL).  Only quota manager depends on it and quota management is off by
default.  Therefore, namespace service is not really needed for master to be functional. 
So we could start namespace service asynchronizely without blocking master startup.

This message was sent by Atlassian JIRA

View raw message