hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-9514) Prevent region from assigning before log splitting is done
Date Mon, 16 Sep 2013 21:41:55 GMT

    [ https://issues.apache.org/jira/browse/HBASE-9514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13768813#comment-13768813
] 

stack commented on HBASE-9514:
------------------------------

Here you are adding back the random region move to try and bring on the issue again:

-        new FlushRandomRegionOfTableAction(tableName)
+        new FlushRandomRegionOfTableAction(tableName),
+        new MoveRandomRegionOfTableAction(tableName)

Why we need this?

-    this.maximumAttempts =
-      this.server.getConfiguration().getInt("hbase.assignment.maximum.attempts", 10);
+    this.maximumAttempts = Math.max(1,
+      this.server.getConfiguration().getInt("hbase.assignment.maximum.attempts", 10));

It could be configured zero?  You saying try at least once?

I suppose +  public Lock acquireLock(final String encodedName) { has to be public because
SSH wants to use it too?

How long do servers hang out in dead servers?

{code}
+        if (!region.isMetaRegion() &&
+            regionStates.wasRegionOnDeadServer(encodedName)) {
+          LOG.info("Skip assigning " + region.getRegionNameAsString()
+            + " because it's host " + regionStates.getLastRegionServerOfRegion(encodedName)
+            + " is dead but not processed");
+          // Make sure the region is offline so that SSH will assign it.
+          // Need to make sure we don't race with SSH.
+          regionOffline(region);
+          return;
+        }
{code}

I suppose it doesn't matter if in dead server for a long time since each server has a startcode?

Does this big block of new code have to go into the middle of assign?  Can it be broken up
a little into methods that are easier to grok?

{code}
+        if (serverManager.isServerOnline(server) &&
+            (t instanceof java.net.SocketTimeoutException ||
+                t instanceof java.net.ConnectException)) {
{code}

Is it a good idea inserting this wait here for every exception?  What if the exception is
a NSRE?  Doesn't NSRE indicate live server?

The big change in the middle I cannot follow.  Can we have a note on what it does?

Do declare and assign in one go I'd say:

+    lastAssignments = new HashMap<String, ServerName>();

I like this map in RS.

Good stuff Jimmy
                
> Prevent region from assigning before log splitting is done
> ----------------------------------------------------------
>
>                 Key: HBASE-9514
>                 URL: https://issues.apache.org/jira/browse/HBASE-9514
>             Project: HBase
>          Issue Type: Bug
>          Components: Region Assignment
>            Reporter: Jimmy Xiang
>            Assignee: Jimmy Xiang
>            Priority: Blocker
>         Attachments: trunk-9514_v1.patch
>
>
> If a region is assigned before log splitting is done by the server shutdown handler,
the edits belonging to this region in the hlogs of the dead server will be lost.
> Generally this is not an issue if users don't assign/unassign a region from hbase shell
or via hbase admin. These commands are marked for experts only in the hbase shell help too.
 However, chaos monkey doesn't care.
> If we can prevent from assigning such regions in a bad time, it would make things a little
safer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message