hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Davies <matt.dav...@tynt.com>
Subject Deadlocked Regionserver process
Date Thu, 14 Jul 2011 18:36:32 GMT
Hey everyone,

We periodically see a situation where the regionserver process exists in the
process list, zookeeper thread sends the keepalive so the master won't
remove it from the active list, yet the regionserver will not serve data.

Hadoop(cdh3u0), HBase 0.90.3 (Apache version), under load from an internal
testing tool.


I've taken a jstack of the process and found this:

Found one Java-level deadlock:
=============================
"IPC Server handler 99 on 60020":
  waiting to lock monitor 0x0000000047f97000 (object 0x00002aaab8ef07e8, a
org.apache.hadoop.hbase.regionserver.MemStoreFlusher),
  which is held by "IPC Server handler 64 on 60020"
"IPC Server handler 64 on 60020":
  waiting for ownable synchronizer 0x00002aaab8eea130, (a
java.util.concurrent.locks.ReentrantLock$NonfairSync),
  which is held by "regionserver60020.cacheFlusher"
"regionserver60020.cacheFlusher":
  waiting to lock monitor 0x0000000047f97000 (object 0x00002aaab8ef07e8, a
org.apache.hadoop.hbase.regionserver.MemStoreFlusher),
  which is held by "IPC Server handler 64 on 60020"

Java stack information for the threads listed above:
===================================================
"IPC Server handler 99 on 60020":
        at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(MemStoreFlusher.java:434)
        - waiting to lock <0x00002aaab8ef07e8> (a
org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
        at
org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2529)
        at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
        at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
"IPC Server handler 64 on 60020":
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00002aaab8eea130> (a
java.util.concurrent.locks.ReentrantLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
        at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
        at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778)
        at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114)
        at
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186)
        at
java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
        at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(MemStoreFlusher.java:435)
        - locked <0x00002aaab8ef07e8> (a
org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
        at
org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2529)
        at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
        at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
"regionserver60020.cacheFlusher":
        at java.util.ResourceBundle.endLoading(ResourceBundle.java:1506)
        - waiting to lock <0x00002aaab8ef07e8> (a
org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
        at java.util.ResourceBundle.findBundle(ResourceBundle.java:1379)
        at java.util.ResourceBundle.findBundle(ResourceBundle.java:1292)
        at java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1234)
        at java.util.ResourceBundle.getBundle(ResourceBundle.java:832)
        at sun.util.resources.LocaleData$1.run(LocaleData.java:127)
        at java.security.AccessController.doPrivileged(Native Method)
        at sun.util.resources.LocaleData.getBundle(LocaleData.java:125)
        at
sun.util.resources.LocaleData.getTimeZoneNames(LocaleData.java:97)
        at
sun.util.TimeZoneNameUtility.getBundle(TimeZoneNameUtility.java:115)
        at
sun.util.TimeZoneNameUtility.retrieveDisplayNames(TimeZoneNameUtility.java:80)
        at java.util.TimeZone.getDisplayNames(TimeZone.java:399)
        at java.util.TimeZone.getDisplayName(TimeZone.java:350)
        at java.util.Date.toString(Date.java:1025)
        at java.lang.String.valueOf(String.java:2826)
        at java.lang.StringBuilder.append(StringBuilder.java:115)
        at
org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue$CompactionRequest.toString(PriorityCompactionQueue.java:114)
        at java.lang.String.valueOf(String.java:2826)
        at java.lang.StringBuilder.append(StringBuilder.java:115)
        at
org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue.addToRegionsInQueue(PriorityCompactionQueue.java:145)
        - locked <0x00002aaab8f2dc58> (a java.util.HashMap)
        at
org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue.add(PriorityCompactionQueue.java:188)
        at
org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompaction(CompactSplitThread.java:140)
        - locked <0x00002aaab8894048> (a
org.apache.hadoop.hbase.regionserver.CompactSplitThread)
        at
org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompaction(CompactSplitThread.java:118)
        - locked <0x00002aaab8894048> (a
org.apache.hadoop.hbase.regionserver.CompactSplitThread)
        at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:393)
        at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:366)
        at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:240)


Any ideas on how I could prevent this or let the master know about it? I've
written an app that will check all regionservers periodically for such a
lockup, but I can't run it constantly.

I can provide more of the jstack if that is helpful.

-Matt

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message