hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-1104) Doubly-assigned regions redux
Date Thu, 08 Jan 2009 04:44:46 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661840#action_12661840
] 

stack commented on HBASE-1104:
------------------------------

Did you mean to add in changes to Index: src/webapps/master/WEB-INF/web.xml?

Want to add more javadoc to the @return in below (Not important...)

Index: src/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java
===================================================================
--- src/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java  (revision 732591)
+++ src/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java  (working copy)
@@ -126,6 +126,7 @@
    * @param regionName name of the region to update
    * @param b BatchUpdate
    * @param expectedValues map of column names to expected data values.
+   * @return true if 

Tell me about this change:

         storedInfo = this.master.serverManager.getServerInfo(serverName);
         deadServer = this.master.serverManager.isDead(serverName);
-        deadServerAndLogsSplit =
-          this.master.serverManager.isDeadServerLogsSplit(serverName);


and...


-      if ((deadServerAndLogsSplit ||
-          (!deadServer && (storedInfo == null ||
-            (storedInfo.getStartCode() != startCode)))) &&
-          this.regionManager.assignable(info)) {
+      if ((deadServer ||
+          (storedInfo == null || storedInfo.getStartCode() != startCode))) {
+

It don't look right.  Changes I made for 1099 were "allow assigning if its a dead server and
its commit logs HAVE been split" or "if NOT a dead server....because if a dead server and
didn't pass first test, then its logs are being split.."  ... We don't want BaseScanner assigning
to servers on dead list.  If regions are assigned to server on dead list, when dead server
runs its scan in shutdown handler, we'll reassign these regions as though they'd been on crashed
server; makes for double assignment and a mess.

You also remove the new method assignable.  Don't we want to check if region is 'assignable'
before dropping into this assigning code block? (Not sure... so asking).

Your patch does this which as discussed on IRC is not whats wanted:

{code}
@@ -1088,12 +1088,8 @@
       byte [] closestKey = store.getRowKeyAtOrBefore(row);
       // If it happens to be an exact match, we can stop looping.
       // Otherwise, we need to check if it's the max and move to the next
-      if (HStoreKey.equalsTwoRowKeys(regionInfo, row, closestKey)) {
+      if (closestKey != null) {
         key = new HStoreKey(closestKey, this.regionInfo);
-      } else if (closestKey != null &&
-          (key == null || HStoreKey.compareTwoRowKeys(
-              regionInfo,closestKey, key.getRow()) > 0) ) {
-        key = new HStoreKey(closestKey, this.regionInfo);
       } else {
         return null;
       }
{code}

Do you think this safe Jim in below?

{code}
@@ -564,9 +566,10 @@
       //       the messages we've received. In this case, a close could be
       //       processed before an open resulting in the master not agreeing on
       //       the region's state.
+      master.regionManager.setClosed(region.getRegionName());
{code}

Will we have the problem where state changes are processed out of order?  Thinking on it,
it doesn't seem so but asking just to check.

I'll hold on testing the patch until answer on above.

> Doubly-assigned regions redux
> -----------------------------
>
>                 Key: HBASE-1104
>                 URL: https://issues.apache.org/jira/browse/HBASE-1104
>             Project: Hadoop HBase
>          Issue Type: Bug
>         Environment: pset cluster with TRUNK.
>            Reporter: stack
>            Assignee: Jim Kellerman
>             Fix For: 0.19.0
>
>         Attachments: 1104.patch
>
>
> Testing, I see doubly assigned regions.  Below is from master log for TestTable,0000135598,1230761605500.
> {code}
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_SPLIT: TestTable,0000116170,1230761152219: TestTable,0000116170,1230761152219
split; daughters: TestTable,0000116170,1230761605500, TestTable,0000135598,1230761605500 from
XX.XX.XX.142:60020
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager:
assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.142:60020
> 2008-12-31 22:13:38,561 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
TestTable,0000135598,1230761605500 open on XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988953
and server XX.XX.XX.142:60020
> 2008-12-31 22:13:44,640 [IPC Server handler 4 on 60000] DEBUG org.apache.hadoop.hbase.master.RegionManager:
Going to close region TestTable,0000135598,1230761605500
> 2008-12-31 22:13:50,441 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager:
assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,457 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
TestTable,0000135598,1230761605500 open on XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988788
and server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,688 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_CLOSE: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:53,688 [HMaster] DEBUG org.apache.hadoop.hbase.master.HMaster: Processing
todo: ProcessRegionClose of TestTable,0000135598,1230761605500, false
> 2008-12-31 22:13:54,263 [IPC Server handler 7 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager:
assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.141:60020
> 2008-12-31 22:13:57,273 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [IPC Server handler 0 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
TestTable,0000135598,1230761605500 open on XX.XX.XX.141:60020
> 2008-12-31 22:14:03,918 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759989031
and server XX.XX.XX.141:60020
> 2008-12-31 22:14:29,350 [RegionManager.metaScanner] DEBUG org.apache.hadoop.hbase.master.BaseScanner:
TestTable,0000135598,1230761605500 no longer has references to TestTable,0000116170,1230761152219
> {code}
> See how we choose to assign before we get the close back from the regionserver.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message