hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Francis Liu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-9457) Master could fail start if region server with system table is down
Date Tue, 10 Sep 2013 00:45:52 GMT

    [ https://issues.apache.org/jira/browse/HBASE-9457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13762532#comment-13762532

Francis Liu commented on HBASE-9457:

For security, the ACL table is assigned just like a user table. It is scanned when it is needed.
If anything is wrong, the operation is retried. So we don't need it to be available all the
time. It is better to be assigned asap, but it is not mandated. So the master can start up
with no problem if the region server holding ACL dies at this moment.
To be more accurate ACL table is able to do this because it mirrors all it's data in ZK. All
reads go through ZK but updates are done on the table itself. And ACL write commands are invoked
by a users so failures will be reflected back to the user so he can try again. While from
the perspective of a metrics table, updates will be done by the system itself so retries have
a different effect on the system. Which may be unacceptable in some scenarios?

For meta, we split meta log for all previouslyFailedServers if log replay is enabled. We don't
want to split log for all failed servers before we say system table is assigned. For meta,
we have meta server shutdown handler to assign meta again so other regions can be assigned.

This is what I was planning HBASE-9148, will address. Essentially have a system wal and the
necessary mechanisms to support it server shutdown handler, open region handler, etc. Does
that work for you?

If during master restarts, the sever holding a system is dead, we need to specially handle
it otherwise the system table won't be available.
I essentially copied the assign meta logic for assign system tables. I didn't see this logic
for meta. Or did I miss that?

> Master could fail start if region server with system table is down
> ------------------------------------------------------------------
>                 Key: HBASE-9457
>                 URL: https://issues.apache.org/jira/browse/HBASE-9457
>             Project: HBase
>          Issue Type: Bug
>          Components: master, Region Assignment
>            Reporter: Jimmy Xiang
>            Assignee: Jimmy Xiang
>            Priority: Critical
> In the region server holding the system table is killed while master is starting, master
will hang there waiting for system table to be assigned which won't happen.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message