Return-Path: Delivered-To: apmail-hadoop-hbase-issues-archive@minotaur.apache.org Received: (qmail 18731 invoked from network); 16 Apr 2010 01:03:47 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 16 Apr 2010 01:03:47 -0000 Received: (qmail 16961 invoked by uid 500); 16 Apr 2010 01:03:47 -0000 Delivered-To: apmail-hadoop-hbase-issues-archive@hadoop.apache.org Received: (qmail 16933 invoked by uid 500); 16 Apr 2010 01:03:47 -0000 Mailing-List: contact hbase-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hbase-issues@hadoop.apache.org Received: (qmail 16925 invoked by uid 99); 16 Apr 2010 01:03:47 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 Apr 2010 01:03:47 +0000 X-ASF-Spam-Status: No, hits=-1297.7 required=10.0 tests=ALL_TRUSTED,AWL X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 Apr 2010 01:03:46 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o3G13Q49000727 for ; Thu, 15 Apr 2010 21:03:26 -0400 (EDT) Message-ID: <18719746.641271379806182.JavaMail.jira@thor> Date: Thu, 15 Apr 2010 21:03:26 -0400 (EDT) From: "stack (JIRA)" To: hbase-issues@hadoop.apache.org Subject: [jira] Commented: (HBASE-2322) deadlock between put and cacheflusher in 0.20 branch MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12857629#action_12857629 ] stack commented on HBASE-2322: ------------------------------ At Todd's suggestion I used his version of jcarder because it does readwrite locks. Its available here: http://github.com/toddlipcon/jcarder/tree/lockclasses I ran a local test with 4 concurrent threads each loading 1M rows -- how I got the deadlock previously -- and then did analysis and it claimed no deadlocks: {code} stack:0.20_pre_durability Stack$ java -Xmx4G -jar ~/checkouts/jcarder/dist/jcarder.jar Opening for reading: /Users/Stack/checkouts/0.20_pre_durability/jcarder_contexts.db Opening for reading: /Users/Stack/checkouts/0.20_pre_durability/jcarder_events.db Loaded from database files: Nodes: 166109 Edges: 494196 (excluding 175530380 duplicated) Cycle analysis result: Cycles: 0 Edges in cycles: 0 Nodes in cycles: 0 Max cycle depth: 0 Max graph depth: 8 No cycles found! {code} I've been running multiple MR jobs over last day or so and we used deadlock reliably at 2% done or so. I've exceeded this 2% many times since w/o deadlocking. I'm going to say that this issue was fixed by hbase-2248. Will open new issue if I see it again. > deadlock between put and cacheflusher in 0.20 branch > ---------------------------------------------------- > > Key: HBASE-2322 > URL: https://issues.apache.org/jira/browse/HBASE-2322 > Project: Hadoop HBase > Issue Type: Bug > Reporter: stack > Assignee: stack > Priority: Blocker > Fix For: 0.20.4, 0.21.0 > > Attachments: hbase-2322.png > > > {code} > Found one Java-level deadlock: > ============================= > "IPC Server handler 59 on 60020": > waiting for ownable synchronizer 0x00007fec9eb050f8, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync), > which is held by "IPC Server handler 54 on 60020" > "IPC Server handler 54 on 60020": > waiting to lock monitor 0x000000004190e950 (object 0x00007fec64f25258, a java.util.HashSet), > which is held by "regionserver/10.20.20.186:60020.cacheFlusher" > "regionserver/10.20.20.186:60020.cacheFlusher": > waiting for ownable synchronizer 0x00007fec651df998, (a java.util.concurrent.locks.ReentrantLock$NonfairSync), > which is held by "IPC Server handler 19 on 60020" > "IPC Server handler 19 on 60020": > waiting for ownable synchronizer 0x00007fec9eb050f8, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync), > which is held by "IPC Server handler 54 on 60020" > Java stack information for the threads listed above: > =================================================== > "IPC Server handler 59 on 60020": > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00007fec9eb050f8> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) > at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747) > at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778) > at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114) > at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:807) > at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1299) > at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1281) > at org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1789) > at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:577) > at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) > "IPC Server handler 54 on 60020": > at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.request(MemStoreFlusher.java:172) > - waiting to lock <0x00007fec64f25258> (a java.util.HashSet) > at org.apache.hadoop.hbase.regionserver.HRegion.requestFlush(HRegion.java:1549) > at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1534) > at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1318) > at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1281) > at org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1789) > at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:577) > at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) > "regionserver/10.20.20.186:60020.cacheFlusher": > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00007fec651df998> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) > at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747) > at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778) > at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114) > at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186) > at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262) > at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:235) > - locked <0x00007fec64f25258> (a java.util.HashSet) > at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:149) > "IPC Server handler 19 on 60020": > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00007fec9eb050f8> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) > at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747) > at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778) > at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114) > at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:807) > at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:980) > at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:873) > at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:241) > at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushSomeRegions(MemStoreFlusher.java:352) > - locked <0x00007fec64ed96f0> (a org.apache.hadoop.hbase.regionserver.MemStoreFlusher) > at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(MemStoreFlusher.java:321) > - locked <0x00007fec64ed96f0> (a org.apache.hadoop.hbase.regionserver.MemStoreFlusher) > at org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1783) > at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:577) > at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) > Found 1 deadlock. > {code} -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira