Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 77BFC12000 for ; Fri, 20 Jun 2014 18:41:27 +0000 (UTC) Received: (qmail 6928 invoked by uid 500); 20 Jun 2014 18:41:26 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 6844 invoked by uid 500); 20 Jun 2014 18:41:26 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 6819 invoked by uid 99); 20 Jun 2014 18:41:26 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 20 Jun 2014 18:41:26 +0000 Date: Fri, 20 Jun 2014 18:41:26 +0000 (UTC) From: "Hadoop QA (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-11380) HRegion lock object is not being released properly, leading to snapshot failure MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-11380?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D140= 39164#comment-14039164 ]=20 Hadoop QA commented on HBASE-11380: ----------------------------------- {color:red}-1 overall{color}. Here are the results of testing the latest a= ttachment=20 http://issues.apache.org/jira/secure/attachment/12651687/11380-v2.txt against trunk revision . ATTACHMENT ID: 12651687 {color:green}+1 @author{color}. The patch does not contain any @author= tags. {color:red}-1 tests included{color}. The patch doesn't appear to inclu= de any new or modified tests. Please justify why no new tests are needed for this= patch. Also please list what manual steps were performed t= o verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the = total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the = total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any = warning messages. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Fi= ndbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not incre= ase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines= longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestZKLessSplit= OnCluster org.apache.hadoop.hbase.regionserver.TestEncryptionKeyRot= ation org.apache.hadoop.hbase.security.access.TestTablePermissi= ons org.apache.hadoop.hbase.master.TestAssignmentListener org.apache.hadoop.hbase.client.TestFromClientSide3 org.apache.hadoop.hbase.master.TestMaster org.apache.hadoop.hbase.regionserver.TestEndToEndSplitTra= nsaction org.apache.hadoop.hbase.io.encoding.TestChangingEncoding org.apache.hadoop.hbase.client.TestAdmin org.apache.hadoop.hbase.client.TestFromClientSideWithCopr= ocessor org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat2 org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat org.apache.hadoop.hbase.regionserver.TestRegionMergeTrans= actionOnCluster org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFil= esSplitRecovery org.apache.hadoop.hbase.migration.TestNamespaceUpgrade org.apache.hadoop.hbase.coprocessor.TestRegionObserverInt= erface org.apache.hadoop.hbase.regionserver.TestTags org.apache.hadoop.hbase.mapreduce.TestSecureLoadIncrement= alHFilesSplitRecovery org.apache.hadoop.hbase.regionserver.TestHRegionServerBul= kLoad org.apache.hadoop.hbase.client.TestFromClientSide org.apache.hadoop.hbase.util.TestHBaseFsck org.apache.hadoop.hbase.io.encoding.TestLoadAndSwitchEnco= deOnDisk org.apache.hadoop.hbase.regionserver.TestZKLessMergeOnClu= ster org.apache.hadoop.hbase.TestIOFencing org.apache.hadoop.hbase.client.TestSnapshotCloneIndepende= nce org.apache.hadoop.hbase.regionserver.TestCompactionState org.apache.hadoop.hbase.regionserver.TestSplitTransaction= OnCluster org.apache.hadoop.hbase.master.TestTableLockManager org.apache.hadoop.hbase.rest.TestTableResource org.apache.hadoop.hbase.master.TestDistributedLogSplittin= g Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/9806//tes= tReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9806= //artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9806= //artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.h= tml Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9806= //artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.htm= l Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9806= //artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9806= //artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9806= //artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9806= //artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.= html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9806= //artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9806= //artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/9806//c= onsole This message is automatically generated. > HRegion lock object is not being released properly, leading to snapshot f= ailure > -------------------------------------------------------------------------= ------ > > Key: HBASE-11380 > URL: https://issues.apache.org/jira/browse/HBASE-11380 > Project: HBase > Issue Type: Bug > Components: regionserver > Affects Versions: 0.98.3 > Reporter: Craig Condit > Assignee: Ted Yu > Fix For: 0.99.0, 0.98.4 > > Attachments: 11380-v1.txt, 11380-v2.txt, HBASE-11380-v2-0.98.3.tx= t > > > Background: > We are attempting to create ~ 750 table snapshots on a nightly basis for = use in MR jobs. The jobs are run in batches, with a maximum of around 20 jo= bs running simultaneously. > We have started to see the following in our region server logs (after < 1= day uptime): > {noformat} > java.lang.Error: Maximum lock count exceeded > =09at java.util.concurrent.locks.ReentrantReadWriteLock$Sync.fullTryAcqui= reShared(ReentrantReadWriteLock.java:531) > =09at java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryAcquireSh= ared(ReentrantReadWriteLock.java:491) > =09at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSha= redNanos(AbstractQueuedSynchronizer.java:1326) > =09at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.tryLock(= ReentrantReadWriteLock.java:873) > =09at org.apache.hadoop.hbase.regionserver.HRegion.lock(HRegion.java:5904= ) > =09at org.apache.hadoop.hbase.regionserver.HRegion.lock(HRegion.java:5891= ) > =09at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(H= Region.java:5798) > =09at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(H= Region.java:5761) > =09at org.apache.hadoop.hbase.regionserver.HRegion.processRowsWithLocks(H= Region.java:4891) > =09at org.apache.hadoop.hbase.regionserver.HRegion.mutateRowsWithLocks(HR= egion.java:4856) > =09at org.apache.hadoop.hbase.regionserver.HRegion.mutateRowsWithLocks(HR= egion.java:4838) > =09at org.apache.hadoop.hbase.regionserver.HRegion.mutateRow(HRegion.java= :4829) > =09at org.apache.hadoop.hbase.regionserver.HRegionServer.mutateRows(HRegi= onServer.java:4390) > =09at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionSer= ver.java:3362) > =09at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientServi= ce$2.callBlockingMethod(ClientProtos.java:29503) > =09at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2012) > =09at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98) > =09at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleR= pcScheduler.java:168) > =09at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpc= Scheduler.java:39) > =09at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcSched= uler.java:111) > =09at java.lang.Thread.run(Thread.java:744) > {noformat} > Not sure of the cause, but the result is that snapshots cannot be created= . We see this in our client logs: > {noformat} > Exception in thread "main" org.apache.hadoop.hbase.snapshot.HBaseSnapshot= Exception: org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapsho= t { ss=3Dtest-snapshot-20140619143753294 table=3Dtest type=3DFLUSH } had an= error. Procedure test-snapshot-20140619143753294 { waiting=3D[p3plpadata0= 38.internal,60020,1403140682587, p3plpadata056.internal,60020,1403140865123= , p3plpadata072.internal,60020,1403141022569] done=3D[p3plpadata023.interna= l,60020,1403140552227, p3plpadata009.internal,60020,1403140487826] } > =09at org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotD= one(SnapshotManager.java:342) > =09at org.apache.hadoop.hbase.master.HMaster.isSnapshotDone(HMaster.java:= 2907) > =09at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterServi= ce$2.callBlockingMethod(MasterProtos.java:40494) > =09at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2012) > =09at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98) > =09at org.apache.hadoop.hbase.ipc.FifoRpcScheduler$1.run(FifoRpcScheduler= .java:73) > =09at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:= 471) > =09at java.util.concurrent.FutureTask.run(FutureTask.java:262) > =09at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecuto= r.java:1145) > =09at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecut= or.java:615) > =09at java.lang.Thread.run(Thread.java:744) > Caused by: org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyTh= rowable via p3plpadata060.internal,60020,1403140935958:org.apache.hadoop.hb= ase.errorhandling.ForeignException$ProxyThrowable:=20 > =09at org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.re= throwException(ForeignExceptionDispatcher.java:83) > =09at org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrow= ExceptionIfFailed(TakeSnapshotHandler.java:320) > =09at org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotD= one(SnapshotManager.java:332) > =09... 10 more > Caused by: org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyTh= rowable:=20 > =09at org.apache.hadoop.hbase.procedure.Subprocedure.cancel(Subprocedure.= java:270) > =09at org.apache.hadoop.hbase.procedure.ProcedureMember.submitSubprocedur= e(ProcedureMember.java:171) > =09at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.startNewSub= procedure(ZKProcedureMemberRpcs.java:214) > =09at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.waitForNewP= rocedures(ZKProcedureMemberRpcs.java:172) > =09at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.access$100(= ZKProcedureMemberRpcs.java:55) > =09at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs$1.nodeChild= renChanged(ZKProcedureMemberRpcs.java:107) > =09at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeepe= rWatcher.java:348) > =09at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn= .java:522) > =09at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498= ) > =09at sun.reflect.GeneratedConstructorAccessor17.newInstance(Unknown Sour= ce) > =09at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Delegatin= gConstructorAccessorImpl.java:45) > =09at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > =09at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteEx= ception.java:106) > =09at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteE= xception.java:95) > =09at org.apache.hadoop.hbase.client.RpcRetryingCaller.translateException= (RpcRetryingCaller.java:207) > =09at org.apache.hadoop.hbase.client.RpcRetryingCaller.translateException= (RpcRetryingCaller.java:221) > =09at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(Rp= cRetryingCaller.java:121) > =09at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(Rp= cRetryingCaller.java:90) > =09at org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmi= n.java:3327) > =09at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:= 2722) > =09at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:= 2655) > =09at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:= 2596) > =09at > [SNIP] > Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apac= he.hadoop.hbase.snapshot.HBaseSnapshotException): org.apache.hadoop.hbase.s= napshot.HBaseSnapshotException: Snapshot { ss=3Dtest-snapshot-2014061914375= 3294 table=3Dtest type=3DFLUSH } had an error. Procedure test-snapshot-201= 40619143753294 { waiting=3D[p3plpadata038.internal,60020,1403140682587, p3p= lpadata056.internal,60020,1403140865123, p3plpadata072.internal,60020,14031= 41022569] done=3D[p3plpadata023.internal,60020,1403140552227, p3plpadata009= .internal,60020,1403140487826] } > =09at org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotD= one(SnapshotManager.java:342) > =09at org.apache.hadoop.hbase.master.HMaster.isSnapshotDone(HMaster.java:= 2907) > =09at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterServi= ce$2.callBlockingMethod(MasterProtos.java:40494) > =09at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2012) > =09at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98) > =09at org.apache.hadoop.hbase.ipc.FifoRpcScheduler$1.run(FifoRpcScheduler= .java:73) > =09at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:= 471) > =09at java.util.concurrent.FutureTask.run(FutureTask.java:262) > =09at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecuto= r.java:1145) > =09at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecut= or.java:615) > =09at java.lang.Thread.run(Thread.java:744) > Caused by: org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyTh= rowable via p3plpadata060.internal,60020,1403140935958:org.apache.hadoop.hb= ase.errorhandling.ForeignException$ProxyThrowable:=20 > =09at org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.re= throwException(ForeignExceptionDispatcher.java:83) > =09at org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrow= ExceptionIfFailed(TakeSnapshotHandler.java:320) > =09at org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotD= one(SnapshotManager.java:332) > =09... 10 more > Caused by: org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyTh= rowable:=20 > =09at org.apache.hadoop.hbase.procedure.Subprocedure.cancel(Subprocedure.= java:270) > =09at org.apache.hadoop.hbase.procedure.ProcedureMember.submitSubprocedur= e(ProcedureMember.java:171) > =09at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.startNewSub= procedure(ZKProcedureMemberRpcs.java:214) > =09at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.waitForNewP= rocedures(ZKProcedureMemberRpcs.java:172) > =09at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.access$100(= ZKProcedureMemberRpcs.java:55) > =09at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs$1.nodeChild= renChanged(ZKProcedureMemberRpcs.java:107) > =09at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeepe= rWatcher.java:348) > =09at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn= .java:522) > =09at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498= ) > =09at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1453) > =09at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.= java:1657) > =09at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementat= ion.callBlockingMethod(RpcClient.java:1715) > =09at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterServi= ce$BlockingStub.isSnapshotDone(MasterProtos.java:42861) > =09at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplem= entation$5.isSnapshotDone(HConnectionManager.java:2048) > =09at org.apache.hadoop.hbase.client.HBaseAdmin$24.call(HBaseAdmin.java:2= 725) > =09at org.apache.hadoop.hbase.client.HBaseAdmin$24.call(HBaseAdmin.java:2= 722) > =09at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(Rp= cRetryingCaller.java:114) > =09... 16 more > {noformat} > =EF=9C=A9 -- This message was sent by Atlassian JIRA (v6.2#6252)