hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeffrey Zhong (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HBASE-10101) testOfflineRegionReAssginedAfterMasterRestart times out sometimes.
Date Mon, 09 Dec 2013 19:38:08 GMT

    [ https://issues.apache.org/jira/browse/HBASE-10101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843448#comment-13843448
] 

Jeffrey Zhong edited comment on HBASE-10101 at 12/9/13 7:36 PM:
----------------------------------------------------------------

The ZK clean is only clear the master address node and RS nodes which should be removed when
a cluster is shut down. The added steps make sure we have a clean restart for normal unit
tests and there are special test cases for master(cluster) restart scenarios. 

I prefer the test case in TestAssignmentManagerOnCluster because it's about region aren't
be assigned during a cluster restart.

Below are my comments on the trunk patch:

{code}
+      regionStates.setLastRegionServerOfRegion(sn, encodedName);
+      if (regionInfo.isMetaRegion()) {
+        // If it's meta region, reset the meta location.
+        // So that master knows the right meta region server.
+        MetaRegionTracker.setMetaLocation(watcher, sn);
+      }
{code}
The above is a little dramatic because we just set internal Memory state to some server. This'll
cause confusion for the future readers.

{code}
-          if (expireIfOnline(currentMetaServer)) {
+          if (!serverManager.isServerDead(currentMetaServer)) {
{code}
This isn't ideal because we could have a race condition that a dead meta server may not report(SessionException)
in time. We could skip meta re-assign and cause master can't be started.

[~jxiang] For your latest patch, it looks good to me except the changes in HMaster.java. I'd
prefer my v3-update patch unless you have a strong feeling about your trunk patch. 

I'll let you decide which to choose and move on. Thanks.



was (Author: jeffreyz):
The ZK clean is only clear the master address node and RS nodes which should be removed when
a cluster is shut down. The added steps make sure we have a clean restart for normal unit
tests and there are special cases for master(cluster) restart scenarios. 

I prefer the test case in TestAssignmentManagerOnCluster because it's about region aren't
be assigned during a cluster restart.

Below are my comments on the trunk patch:

{code}
+      regionStates.setLastRegionServerOfRegion(sn, encodedName);
+      if (regionInfo.isMetaRegion()) {
+        // If it's meta region, reset the meta location.
+        // So that master knows the right meta region server.
+        MetaRegionTracker.setMetaLocation(watcher, sn);
+      }
{code}
The above is a little dramatic because we just set internal Memory state to some server. This'll
cause confusion for the future readers.

{code}
-          if (expireIfOnline(currentMetaServer)) {
+          if (!serverManager.isServerDead(currentMetaServer)) {
{code}
This isn't ideal because we could have a race condition that a dead meta server may not report(SessionException)
in time. We could skip meta re-assign and cause master can't be started.

[~jxiang] For your latest patch, it looks good to me except the changes in HMaster.java. I'd
prefer my v3-update patch unless you have a strong feeling about your trunk patch. 

I'll let you decide which to choose and move on. Thanks.


> testOfflineRegionReAssginedAfterMasterRestart times out sometimes.
> ------------------------------------------------------------------
>
>                 Key: HBASE-10101
>                 URL: https://issues.apache.org/jira/browse/HBASE-10101
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Jimmy Xiang
>            Assignee: Jeffrey Zhong
>         Attachments: hbase-10101-v2.patch, hbase-10101-v3-update.patch, hbase-10101-v3.patch,
hbase-10101.patch, test.log, trunk-10101.patch, trunk-10101_v2.patch
>
>
> Sometimes, I got this test timed out. The log is attached. It could be because the new
cluster takes a while to process the dead server, or assign meta.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Mime
View raw message