Return-Path: Delivered-To: apmail-hadoop-hbase-dev-archive@locus.apache.org Received: (qmail 44954 invoked from network); 8 Jan 2009 04:45:08 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 8 Jan 2009 04:45:08 -0000 Received: (qmail 72031 invoked by uid 500); 8 Jan 2009 04:45:07 -0000 Delivered-To: apmail-hadoop-hbase-dev-archive@hadoop.apache.org Received: (qmail 72012 invoked by uid 500); 8 Jan 2009 04:45:07 -0000 Mailing-List: contact hbase-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-dev@hadoop.apache.org Delivered-To: mailing list hbase-dev@hadoop.apache.org Received: (qmail 71996 invoked by uid 99); 8 Jan 2009 04:45:07 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Jan 2009 20:45:07 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 08 Jan 2009 04:45:06 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 8E558234C47E for ; Wed, 7 Jan 2009 20:44:46 -0800 (PST) Message-ID: <1282773133.1231389886581.JavaMail.jira@brutus> Date: Wed, 7 Jan 2009 20:44:46 -0800 (PST) From: "stack (JIRA)" To: hbase-dev@hadoop.apache.org Subject: [jira] Commented: (HBASE-1104) Doubly-assigned regions redux In-Reply-To: <769181966.1230764926216.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661840#action_12661840 ] stack commented on HBASE-1104: ------------------------------ Did you mean to add in changes to Index: src/webapps/master/WEB-INF/web.xml? Want to add more javadoc to the @return in below (Not important...) Index: src/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java =================================================================== --- src/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java (revision 732591) +++ src/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java (working copy) @@ -126,6 +126,7 @@ * @param regionName name of the region to update * @param b BatchUpdate * @param expectedValues map of column names to expected data values. + * @return true if Tell me about this change: storedInfo = this.master.serverManager.getServerInfo(serverName); deadServer = this.master.serverManager.isDead(serverName); - deadServerAndLogsSplit = - this.master.serverManager.isDeadServerLogsSplit(serverName); and... - if ((deadServerAndLogsSplit || - (!deadServer && (storedInfo == null || - (storedInfo.getStartCode() != startCode)))) && - this.regionManager.assignable(info)) { + if ((deadServer || + (storedInfo == null || storedInfo.getStartCode() != startCode))) { + It don't look right. Changes I made for 1099 were "allow assigning if its a dead server and its commit logs HAVE been split" or "if NOT a dead server....because if a dead server and didn't pass first test, then its logs are being split.." ... We don't want BaseScanner assigning to servers on dead list. If regions are assigned to server on dead list, when dead server runs its scan in shutdown handler, we'll reassign these regions as though they'd been on crashed server; makes for double assignment and a mess. You also remove the new method assignable. Don't we want to check if region is 'assignable' before dropping into this assigning code block? (Not sure... so asking). Your patch does this which as discussed on IRC is not whats wanted: {code} @@ -1088,12 +1088,8 @@ byte [] closestKey = store.getRowKeyAtOrBefore(row); // If it happens to be an exact match, we can stop looping. // Otherwise, we need to check if it's the max and move to the next - if (HStoreKey.equalsTwoRowKeys(regionInfo, row, closestKey)) { + if (closestKey != null) { key = new HStoreKey(closestKey, this.regionInfo); - } else if (closestKey != null && - (key == null || HStoreKey.compareTwoRowKeys( - regionInfo,closestKey, key.getRow()) > 0) ) { - key = new HStoreKey(closestKey, this.regionInfo); } else { return null; } {code} Do you think this safe Jim in below? {code} @@ -564,9 +566,10 @@ // the messages we've received. In this case, a close could be // processed before an open resulting in the master not agreeing on // the region's state. + master.regionManager.setClosed(region.getRegionName()); {code} Will we have the problem where state changes are processed out of order? Thinking on it, it doesn't seem so but asking just to check. I'll hold on testing the patch until answer on above. > Doubly-assigned regions redux > ----------------------------- > > Key: HBASE-1104 > URL: https://issues.apache.org/jira/browse/HBASE-1104 > Project: Hadoop HBase > Issue Type: Bug > Environment: pset cluster with TRUNK. > Reporter: stack > Assignee: Jim Kellerman > Fix For: 0.19.0 > > Attachments: 1104.patch > > > Testing, I see doubly assigned regions. Below is from master log for TestTable,0000135598,1230761605500. > {code} > 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_SPLIT: TestTable,0000116170,1230761152219: TestTable,0000116170,1230761152219 split; daughters: TestTable,0000116170,1230761605500, TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020 > 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.142:60020 > 2008-12-31 22:13:38,561 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020 > 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.142:60020 > 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988953 and server XX.XX.XX.142:60020 > 2008-12-31 22:13:44,640 [IPC Server handler 4 on 60000] DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region TestTable,0000135598,1230761605500 > 2008-12-31 22:13:50,441 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.139:60020 > 2008-12-31 22:13:53,457 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020 > 2008-12-31 22:13:53,458 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020 > 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.139:60020 > 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988788 and server XX.XX.XX.139:60020 > 2008-12-31 22:13:53,688 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_CLOSE: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020 > 2008-12-31 22:13:53,688 [HMaster] DEBUG org.apache.hadoop.hbase.master.HMaster: Processing todo: ProcessRegionClose of TestTable,0000135598,1230761605500, false > 2008-12-31 22:13:54,263 [IPC Server handler 7 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.141:60020 > 2008-12-31 22:13:57,273 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020 > 2008-12-31 22:14:03,917 [IPC Server handler 0 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020 > 2008-12-31 22:14:03,917 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.141:60020 > 2008-12-31 22:14:03,918 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759989031 and server XX.XX.XX.141:60020 > 2008-12-31 22:14:29,350 [RegionManager.metaScanner] DEBUG org.apache.hadoop.hbase.master.BaseScanner: TestTable,0000135598,1230761605500 no longer has references to TestTable,0000116170,1230761152219 > {code} > See how we choose to assign before we get the close back from the regionserver. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.