Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 207CD180C8 for ; Thu, 20 Aug 2015 22:33:46 +0000 (UTC) Received: (qmail 16007 invoked by uid 500); 20 Aug 2015 22:33:45 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 15962 invoked by uid 500); 20 Aug 2015 22:33:45 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 15735 invoked by uid 99); 20 Aug 2015 22:33:45 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Aug 2015 22:33:45 +0000 Date: Thu, 20 Aug 2015 22:33:45 +0000 (UTC) From: "stack (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-14274) Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-14274?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D147= 05903#comment-14705903 ]=20 stack commented on HBASE-14274: ------------------------------- This related to your change? Should protect against it? {code} 119113 2015-08-20 15:31:10,704 WARN [HBase-Metrics2-1] impl.MetricsConfig(= 124): Cannot locate configuration: tried hadoop-metrics2-hbase.properties,h= adoop-metrics2.properties 119114 2015-08-20 15:31:10,710 ERROR [HBase-Metrics2-1] lib.MethodMetric$2(= 118): Error invoking method getBlocksTotal 119115 java.lang.reflect.InvocationTargetException 119116 =E2=80=BA at sun.reflect.GeneratedMethodAccessor72.invoke(Unknown = Source) 119117 =E2=80=BA at sun.reflect.DelegatingMethodAccessorImpl.invoke(Deleg= atingMethodAccessorImpl.java:43) 119118 =E2=80=BA at java.lang.reflect.Method.invoke(Method.java:606) 119119 =E2=80=BA at org.apache.hadoop.metrics2.lib.MethodMetric$2.snapsho= t(MethodMetric.java:111) 119120 =E2=80=BA at org.apache.hadoop.metrics2.lib.MethodMetric.snapshot(= MethodMetric.java:144) 119121 =E2=80=BA at org.apache.hadoop.metrics2.lib.MetricsRegistry.snapsh= ot(MetricsRegistry.java:387) 119122 =E2=80=BA at org.apache.hadoop.metrics2.lib.MetricsSourceBuilder$1= .getMetrics(MetricsSourceBuilder.java:79) 119123 =E2=80=BA at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.= getMetrics(MetricsSourceAdapter.java:195) 119124 =E2=80=BA at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.= updateJmxCache(MetricsSourceAdapter.java:172) 119125 =E2=80=BA at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.= getMBeanInfo(MetricsSourceAdapter.java:151) 119126 =E2=80=BA at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor= .getNewMBeanClassName(DefaultMBeanServerInterceptor.java:333) 119127 =E2=80=BA at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor= .registerMBean(DefaultMBeanServerInterceptor.java:319) 119128 =E2=80=BA at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(= JmxMBeanServer.java:522) 119129 =E2=80=BA at org.apache.hadoop.metrics2.util.MBeans.register(MBean= s.java:57) 119130 =E2=80=BA at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.= startMBeans(MetricsSourceAdapter.java:221) 119131 =E2=80=BA at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.= start(MetricsSourceAdapter.java:96) 119132 =E2=80=BA at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.reg= isterSource(MetricsSystemImpl.java:245) 119133 =E2=80=BA at org.apache.hadoop.metrics2.impl.MetricsSystemImpl$1.p= ostStart(MetricsSystemImpl.java:229) 119134 =E2=80=BA at sun.reflect.GeneratedMethodAccessor50.invoke(Unknown = Source) 119135 =E2=80=BA at sun.reflect.DelegatingMethodAccessorImpl.invoke(Deleg= atingMethodAccessorImpl.java:43) 119136 =E2=80=BA at java.lang.reflect.Method.invoke(Method.java:606) 119137 =E2=80=BA at org.apache.hadoop.metrics2.impl.MetricsSystemImpl$3.i= nvoke(MetricsSystemImpl.java:290) 119138 =E2=80=BA at com.sun.proxy.$Proxy13.postStart(Unknown Source) 119139 =E2=80=BA at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.sta= rt(MetricsSystemImpl.java:185) 119140 =E2=80=BA at org.apache.hadoop.metrics2.impl.JmxCacheBuster$JmxCac= heBusterRunnable.run(JmxCacheBuster.java:81) 119141 =E2=80=BA at java.util.concurrent.Executors$RunnableAdapter.call(E= xecutors.java:471) 119142 =E2=80=BA at java.util.concurrent.FutureTask.run(FutureTask.java:2= 62) 119143 =E2=80=BA at java.util.concurrent.ScheduledThreadPoolExecutor$Sche= duledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) 119144 =E2=80=BA at java.util.concurrent.ScheduledThreadPoolExecutor$Sche= duledFutureTask.run(ScheduledThreadPoolExecutor.java:292) 119145 =E2=80=BA at java.util.concurrent.ThreadPoolExecutor.runWorker(Thr= eadPoolExecutor.java:1145) 119146 =E2=80=BA at java.util.concurrent.ThreadPoolExecutor$Worker.run(Th= readPoolExecutor.java:615) 119147 =E2=80=BA at java.lang.Thread.run(Thread.java:744) 119148 Caused by: java.lang.NullPointerException 119149 =E2=80=BA at org.apache.hadoop.hdfs.server.blockmanagement.BlocksM= ap.size(BlocksMap.java:198) 119150 =E2=80=BA at org.apache.hadoop.hdfs.server.blockmanagement.BlockMa= nager.getTotalBlocks(BlockManager.java:3158) 119151 =E2=80=BA at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.g= etBlocksTotal(FSNamesystem.java:5652) 119152 =E2=80=BA ... 32 more {code} > Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs Metric= sRegionAggregateSourceImpl > -------------------------------------------------------------------------= -------------------------- > > Key: HBASE-14274 > URL: https://issues.apache.org/jira/browse/HBASE-14274 > Project: HBase > Issue Type: Sub-task > Components: test > Reporter: stack > Attachments: 23612.stack, HBASE-14274-v1.patch, HBASE-14274.patch > > > Looking into parent issue, got a hang locally of TestDistributedLogReplay= . > We have region closes here: > {code} > "RS_CLOSE_META-localhost:59610-0" prio=3D5 tid=3D0x00007ff65c03f800 nid= =3D0x54347 waiting on condition [0x000000011f7ac000] > java.lang.Thread.State: WAITING (parking) > =09at sun.misc.Unsafe.park(Native Method) > =09- parking to wait for <0x000000075636d8c0> (a java.util.concurrent.lo= cks.ReentrantReadWriteLock$NonfairSync) > =09at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > =09at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckI= nterrupt(AbstractQueuedSynchronizer.java:834) > =09at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued= (AbstractQueuedSynchronizer.java:867) > =09at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(Abstr= actQueuedSynchronizer.java:1197) > =09at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(Re= entrantReadWriteLock.java:945) > =09at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceIm= pl.deregister(MetricsRegionAggregateSourceImpl.java:78) > =09at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.close(= MetricsRegionSourceImpl.java:120) > =09at org.apache.hadoop.hbase.regionserver.MetricsRegion.close(MetricsReg= ion.java:41) > =09at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1= 500) > =09at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:134= 4) > =09- locked <0x00000007ff878190> (a java.lang.Object) > =09at org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.pro= cess(CloseRegionHandler.java:102) > =09at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java= :103) > =09at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecuto= r.java:1145) > =09at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecut= or.java:615) > =09at java.lang.Thread.run(Thread.java:744) > {code} > They are trying to MetricsRegionAggregateSourceImpl.deregister. They want= to get a write lock on this classes local ReentrantReadWriteLock while hol= ding MetricsRegionSourceImpl's readWriteLock write lock. > Then, elsewhere the JmxCacheBuster is running trying to get metrics with = above locks held in reverse: > {code} > "HBase-Metrics2-1" daemon prio=3D5 tid=3D0x00007ff65e14b000 nid=3D0x59a03= waiting on condition [0x0000000140ea5000] > java.lang.Thread.State: WAITING (parking) > =09at sun.misc.Unsafe.park(Native Method) > =09- parking to wait for <0x00000007cade1480> (a java.util.concurrent.lo= cks.ReentrantReadWriteLock$NonfairSync) > =09at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > =09at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckI= nterrupt(AbstractQueuedSynchronizer.java:834) > =09at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShar= ed(AbstractQueuedSynchronizer.java:964) > =09at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared= (AbstractQueuedSynchronizer.java:1282) > =09at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(Ree= ntrantReadWriteLock.java:731) > =09at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.snapsh= ot(MetricsRegionSourceImpl.java:193) > =09at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceIm= pl.getMetrics(MetricsRegionAggregateSourceImpl.java:115) > =09at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(Met= ricsSourceAdapter.java:195) > =09at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache= (MetricsSourceAdapter.java:172) > =09at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(M= etricsSourceAdapter.java:151) > =09at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanCl= assName(DefaultMBeanServerInterceptor.java:333) > =09at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean= (DefaultMBeanServerInterceptor.java:319) > =09at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer= .java:522) > =09at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:57) > =09at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.startMBeans(Me= tricsSourceAdapter.java:221) > =09- locked <0x00000007e654bdc0> (a org.apache.hadoop.metrics2.impl.Metri= csSourceAdapter) > =09at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.start(MetricsS= ourceAdapter.java:96) > =09at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.registerSource(Me= tricsSystemImpl.java:245) > =09- locked <0x0000000754302660> (a org.apache.hadoop.metrics2.impl.Metri= csSystemImpl) > =09at org.apache.hadoop.metrics2.impl.MetricsSystemImpl$1.postStart(Metri= csSystemImpl.java:229) > =09at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source) > =09at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcc= essorImpl.java:43) > =09at java.lang.reflect.Method.invoke(Method.java:606) > =09at org.apache.hadoop.metrics2.impl.MetricsSystemImpl$3.invoke(MetricsS= ystemImpl.java:290) > =09at com.sun.proxy.$Proxy13.postStart(Unknown Source) > =09at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.start(MetricsSyst= emImpl.java:185) > =09- locked <0x0000000754302660> (a org.apache.hadoop.metrics2.impl.Metri= csSystemImpl) > =09at org.apache.hadoop.metrics2.impl.JmxCacheBuster$JmxCacheBusterRunnab= le.run(JmxCacheBuster.java:81) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)