Return-Path: Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: (qmail 99212 invoked from network); 4 Dec 2010 00:08:32 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 4 Dec 2010 00:08:32 -0000 Received: (qmail 73587 invoked by uid 500); 4 Dec 2010 00:08:32 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 73566 invoked by uid 500); 4 Dec 2010 00:08:32 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 73558 invoked by uid 99); 4 Dec 2010 00:08:32 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 04 Dec 2010 00:08:32 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 04 Dec 2010 00:08:31 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id oB408Ae1005043 for ; Sat, 4 Dec 2010 00:08:11 GMT Message-ID: <27711927.103951291421290950.JavaMail.jira@thor> Date: Fri, 3 Dec 2010 19:08:10 -0500 (EST) From: "stack (JIRA)" To: issues@hbase.apache.org Subject: [jira] Updated: (HBASE-3298) Regionserver can close during a split causing double assignment In-Reply-To: <16639553.55841291239610887.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-3298: ------------------------- Attachment: 3298-addendum.txt Here is an addendum. J-D saw deadlock'ing and dbl-assign because of close and concurrent split. The close got an NSRE because region had been offlined by time close came in. On receipt of an NSRE, the master will force offline -- though in this case there was nothing in zk so it failed -- and then reassign the region. Bad. Trying this patch. > Regionserver can close during a split causing double assignment > --------------------------------------------------------------- > > Key: HBASE-3298 > URL: https://issues.apache.org/jira/browse/HBASE-3298 > Project: HBase > Issue Type: Bug > Reporter: ryan rawson > Assignee: ryan rawson > Priority: Blocker > Fix For: 0.90.0 > > Attachments: 3298-addendum.txt, 3298-v5.txt, am.txt, am.txt, HBASE-3298-2.txt, HBASE-3298-3.txt, HBASE-3298-4.txt, HBASE-3298.txt > > > A regionserver got a close message during a split, which seemed to cause the split to fail, and also caused a double assignment and orphaned region. > The log: > 2010-11-30 18:29:24,310 INFO org.apache.hadoop.hbase.regionserver.HRegion: completed compaction on region usertable,user877196232,1291170014467.6e260cb69fda466a97f650debd5f0 > f01. after 40sec > 2010-11-30 18:29:24,310 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of region usertable,user877196232,1291170014467.6e260cb69fda466a97f650debd > 5f0f01. > 2010-11-30 18:29:24,312 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Closing usertable,user877196232,1291170014467.6e260cb69fda466a97f650debd5f0f01.: disabling compac > tions & flushes > 2010-11-30 18:29:24,312 INFO org.apache.hadoop.hbase.regionserver.HRegion: Running close preflush of usertable,user877196232,1291170014467.6e260cb69fda466a97f650debd5f0f01. > 2010-11-30 18:29:24,312 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for usertable,user877196232,1291170014467.6e260cb69fda466a97f650debd5f0f01 > ., current region memstore size 256.1m > 2010-11-30 18:29:24,312 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Finished snapshotting, commencing flushing stores > 2010-11-30 18:29:24,700 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: NOT flushing memstore for region usertable,user877196232,1291170014467.6e260cb69fda466a97f650debd > 5f0f01., flushing=false, writesEnabled=false > 2010-11-30 18:29:27,291 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Flush requested on usertable,user877196232,1291170014467.6e260cb69fda466a97f650debd5f0f01. > 2010-11-30 18:29:27,291 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: NOT flushing memstore for region usertable,user877196232,1291170014467.6e260cb69fda466a97f650debd > 5f0f01., flushing=false, writesEnabled=false > 2010-11-30 18:29:27,961 INFO org.apache.hadoop.hbase.regionserver.Store: Renaming flushed file at hdfs://sv4borg235:9000/hbase-rc1/usertable/6e260cb69fda466a97f650debd5f0f01 > /.tmp/1673662210031989562 to hdfs://sv4borg235:9000/hbase-rc1/usertable/6e260cb69fda466a97f650debd5f0f01/data/1156800948892817282 > 2010-11-30 18:29:27,966 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://sv4borg235:9000/hbase-rc1/usertable/6e260cb69fda466a97f650debd5f0f01/data/1156800948892 > 817282, entries=1157300, sequenceid=14819, memsize=256.1m, filesize=59.4m > 2010-11-30 18:29:27,966 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region usertable,user877196232,1291170014467.6e260cb69fda466a97f650debd5f0f > 01. 'IPC Server handler 1 on 60020' > 2010-11-30 18:29:27,966 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region usertable,user877196232,1291170014467.6e260cb69fda466a97f650debd5f0f01. 'IPC Server handler 0 on 60020' > 2010-11-30 18:29:27,967 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region usertable,user877196232,1291170014467.6e260cb69fda466a97f650debd5f0f01. 'IPC Server handler 9 on 60020' > 2010-11-30 18:29:27,967 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region usertable,user877196232,1291170014467.6e260cb69fda466a97f650debd5f0f01. 'IPC Server handler 4 on 60020' > 2010-11-30 18:29:27,967 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region usertable,user877196232,1291170014467.6e260cb69fda466a97f650debd5f0f01. 'IPC Server handler 8 on 60020' > 2010-11-30 18:29:27,967 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region usertable,user877196232,1291170014467.6e260cb69fda466a97f650debd5f0f01. 'IPC Server handler 5 on 60020' > 2010-11-30 18:29:27,966 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region usertable,user877196232,1291170014467.6e260cb69fda466a97f650debd5f0f01. 'IPC Server handler 6 on 60020' > 2010-11-30 18:29:27,966 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~256.1m for region usertable,user877196232,1291170014467.6e260cb69fda466a97f650debd5f0f01. in 3654ms, sequenceid=14819, compaction requested=true > 2010-11-30 18:29:27,967 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region usertable,user877196232,1291170014467.6e260cb69fda466a97f650debd5f0f01. 'IPC Server handler 7 on 60020' > 2010-11-30 18:29:27,967 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region usertable,user877196232,1291170014467.6e260cb69fda466a97f650debd5f0f01. 'IPC Server handler 3 on 60020' > 2010-11-30 18:29:27,967 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region usertable,user877196232,1291170014467.6e260cb69fda466a97f650debd5f0f01. 'IPC Server handler 2 on 60020' > 2010-11-30 18:29:28,005 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received close region: usertable,user877196232,1291170014467.6e260cb69fda466a97f650debd5f0f01. > 2010-11-30 18:29:28,011 DEBUG org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Processing close of usertable,user877196232,1291170014467.6e260cb69fda466a97f650debd5f0f01. > 2010-11-30 18:29:28,012 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x2c765e0da906eb Creating unassigned node for 6e260cb69fda466a97f650debd5f0f01 in a CLOSING state > 2010-11-30 18:29:28,074 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Updates disabled for region usertable,user877196232,1291170014467.6e260cb69fda466a97f650debd5f0f01. > 2010-11-30 18:29:28,074 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for usertable,user877196232,1291170014467.6e260cb69fda466a97f650debd5f0f01., current region memstore size 4.8m > 2010-11-30 18:29:28,074 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Finished snapshotting, commencing flushing stores > 2010-11-30 18:29:28,079 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Closing usertable,user877196232,1291170014467.6e260cb69fda466a97f650debd5f0f01.: disabling compactions & flushes > 2010-11-30 18:29:28,332 INFO org.apache.hadoop.hbase.regionserver.Store: Renaming flushed file at hdfs://sv4borg235:9000/hbase-rc1/usertable/6e260cb69fda466a97f650debd5f0f01/.tmp/1412284505922212951 to hdfs://sv4borg235:9000/hbase-rc1/usertable/6e260cb69fda466a97f650debd5f0f01/data/3835216797592840515 > 2010-11-30 18:29:28,345 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://sv4borg235:9000/hbase-rc1/usertable/6e260cb69fda466a97f650debd5f0f01/data/3835216797592840515, entries=21600, sequenceid=14835, memsize=4.8m, filesize=1.1m > 2010-11-30 18:29:28,365 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~4.8m for region usertable,user877196232,1291170014467.6e260cb69fda466a97f650debd5f0f01. in 291ms, sequenceid=14835, compaction requested=true > 2010-11-30 18:29:28,381 DEBUG org.apache.hadoop.hbase.regionserver.Store: closed data > 2010-11-30 18:29:28,381 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed usertable,user877196232,1291170014467.6e260cb69fda466a97f650debd5f0f01. > 2010-11-30 18:29:28,382 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x2c765e0da906eb Attempting to transition node 6e260cb69fda466a97f650debd5f0f01 from RS_ZK_REGION_CLOSING to RS_ZK_REGION_CLOSED > 2010-11-30 18:29:28,466 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x2c765e0da906eb Successfully transitioned node 6e260cb69fda466a97f650debd5f0f01 from RS_ZK_REGION_CLOSING to RS_ZK_REGION_CLOSED > 2010-11-30 18:29:28,466 DEBUG org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Closed region usertable,user877196232,1291170014467.6e260cb69fda466a97f650debd5f0f01. > 2010-11-30 18:29:29,000 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received close region: usertable,user1476676931,1291170472014.829c72aa4c47a31bd71b735da7e33883. > 2010-11-30 18:29:29,001 DEBUG org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Processing close of usertable,user1476676931,1291170472014.829c72aa4c47a31bd71b735da7e33883. > 2010-11-30 18:29:29,013 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x2c765e0da906eb Creating unassigned node for 829c72aa4c47a31bd71b735da7e33883 in a CLOSING state > 2010-11-30 18:29:29,079 INFO org.apache.hadoop.hbase.regionserver.CompactSplitThread: Running rollback of failed split of usertable,user877196232,1291170014467.6e260cb69fda466a97f650debd5f0f01.; org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /hbase-rc1/usertable/6e260cb69fda466a97f650debd5f0f01/splits/019cf676872050ee05c15a6a37655e8a/data/5976204784865149447.6e260cb69fda466a97f650debd5f0f01 File does not exist. [Lease. Holder: DFSClient_hb_rs_sv4borg226,60020,1291168733786_1291168755622, pendingcreates: 1] > at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1378) > at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1369) > at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:1424) > at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1412) > at org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:491) > at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:968) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:964) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:962) > 2010-11-30 18:29:29,107 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Closing usertable,user1476676931,1291170472014.829c72aa4c47a31bd71b735da7e33883.: disabling compactions & flushes > 2010-11-30 18:29:29,107 INFO org.apache.hadoop.hbase.regionserver.HRegion: Running close preflush of usertable,user1476676931,1291170472014.829c72aa4c47a31bd71b735da7e33883. > 2010-11-30 18:29:29,108 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for usertable,user1476676931,1291170472014.829c72aa4c47a31bd71b735da7e33883., current region memstore size 63.4m > 2010-11-30 18:29:29,108 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Finished snapshotting, commencing flushing stores > 2010-11-30 18:29:29,118 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://sv4borg235:9000/hbase-rc1/usertable/6e260cb69fda466a97f650debd5f0f01/data/1156800948892817282, isReference=false, isBulkLoadResult=false, seqid=14819, majorCompaction=false > 2010-11-30 18:29:29,163 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://sv4borg235:9000/hbase-rc1/usertable/6e260cb69fda466a97f650debd5f0f01/data/1485319185163827758, isReference=false, isBulkLoadResult=false, seqid=12873, majorCompaction=false > 2010-11-30 18:29:29,179 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://sv4borg235:9000/hbase-rc1/usertable/6e260cb69fda466a97f650debd5f0f01/data/2866498733799966821, isReference=false, isBulkLoadResult=false, seqid=12619, majorCompaction=false > 2010-11-30 18:29:29,219 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://sv4borg235:9000/hbase-rc1/usertable/6e260cb69fda466a97f650debd5f0f01/data/3835216797592840515, isReference=false, isBulkLoadResult=false, seqid=14835, majorCompaction=false > 2010-11-30 18:29:29,237 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://sv4borg235:9000/hbase-rc1/usertable/6e260cb69fda466a97f650debd5f0f01/data/444462617223715350, isReference=false, isBulkLoadResult=false, seqid=12005, majorCompaction=false > 2010-11-30 18:29:29,247 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://sv4borg235:9000/hbase-rc1/usertable/6e260cb69fda466a97f650debd5f0f01/data/5536858845826091476, isReference=false, isBulkLoadResult=false, seqid=13173, majorCompaction=false > 2010-11-30 18:29:29,254 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://sv4borg235:9000/hbase-rc1/usertable/6e260cb69fda466a97f650debd5f0f01/data/5976204784865149447, isReference=false, isBulkLoadResult=false, seqid=13427, majorCompaction=false > 2010-11-30 18:29:29,275 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://sv4borg235:9000/hbase-rc1/usertable/6e260cb69fda466a97f650debd5f0f01/data/8614929289191582955, isReference=false, isBulkLoadResult=false, seqid=6111, majorCompaction=false > 2010-11-30 18:29:29,281 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://sv4borg235:9000/hbase-rc1/usertable/6e260cb69fda466a97f650debd5f0f01/data/9150811729171627958, isReference=false, isBulkLoadResult=false, seqid=12265, majorCompaction=false > 2010-11-30 18:29:29,283 INFO org.apache.hadoop.hbase.regionserver.HRegion: Onlined usertable,user877196232,1291170014467.6e260cb69fda466a97f650debd5f0f01.; next sequenceid=14836 > 2010-11-30 18:29:29,284 INFO org.apache.hadoop.hbase.regionserver.CompactSplitThread: Successful rollback of failed split of usertable,user877196232,1291170014467.6e260cb69f > da466a97f650debd5f0f01. > 2010-11-30 18:29:29,284 INFO org.apache.hadoop.hbase.regionserver.HRegion: NOT compacting region usertable,user877196232,1291170014467.6e260cb69fda466a97f650debd5f0f01.: compacting=false, writesEnabled=false -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.