hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From wangkai <wk335681...@163.com>
Subject regionserver stuck
Date Wed, 16 Dec 2015 03:24:35 GMT
Hi, all:

   our hbase cluster often occurs  a strange phenomenon,we can’t query or execute bulkload
when one machine goes down,like the hbase cluster is crashed. So we took a look at the jstack
of the regionserver,  we found some threads were blocked, waiting for the lock. Here is the
jstack of the regionserver: 



"RpcServer.handler=100,port=60020" daemon prio=10 tid=0x00007f6a48b69000 nid=0x6b23 waiting
for monitor entry [0x00007f66c9a70000]

   java.lang.Thread.State: BLOCKED (on object monitor)

        at org.apache.hadoop.hdfs.DFSInputStream.getFileLength(DFSInputStream.java:354)

        - waiting to lock <0x00000004dc908f30> (a org.apache.hadoop.hdfs.DFSInputStream)

        at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1279)

        at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:90)

        at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1224)

        at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1432)

        at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1314)

        at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:355)

        at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:253)

        at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:494)

        at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:515)

        at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:238)

        at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:153)

        at org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:317)

        at org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:176)

        at org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:1761)

        at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.<init>(HRegion.java:3687)

        at org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:1872)

        at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1858)

        at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1835)

        at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3068)

        at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29497)

        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2012)

        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)

        at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)

        at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)

        at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)

        at java.lang.Thread.run(Thread.java:745)




   Locked ownable synchronizers:

        - None

"regionserver60020-smallCompactions-1428948439258" daemon prio=10 tid=0x00007f6a5001b000 nid=0x6ca1
runnable [0x00007f66c5524000]

   java.lang.Thread.State: RUNNABLE

        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)

        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)

        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)

        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)

        - locked <0x00000004e7ca3340> (a sun.nio.ch.Util$2)

        - locked <0x00000004e7ca3328> (a java.util.Collections$UnmodifiableSet)

        - locked <0x00000004e762abf8> (a sun.nio.ch.EPollSelectorImpl)

        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)

        at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335)

        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)

        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)

        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.readChannelFully(PacketReceiver.java:258)

        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:209)

        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:171)

        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102)

        at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:173)

        at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:138)

        - locked <0x0000000514bd5730> (a org.apache.hadoop.hdfs.RemoteBlockReader2)

        at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:686)

        at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:742)

        - eliminated <0x00000004dc908f30> (a org.apache.hadoop.hdfs.DFSInputStream)

        at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:799)

        at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:840)

        - locked <0x00000004dc908f30> (a org.apache.hadoop.hdfs.DFSInputStream)

        at java.io.DataInputStream.read(DataInputStream.java:149)

        at org.apache.hadoop.hbase.io.hfile.HFileBlock.readWithExtra(HFileBlock.java:563)

        at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1215)

        at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1432)

        at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1314)

        at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:355)

        at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.readNextDataBlock(HFileReaderV2.java:605)

        at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$EncodedScannerV2.next(HFileReaderV2.java:1065)

        at org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:137)

        at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:108)

        at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:507)

        at org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:217)

        at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:76)

        at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:109)

        at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1080)

        at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1409)

        at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:475)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

        at java.lang.Thread.run(Thread.java:745)




   Locked ownable synchronizers:

        - <0x00000004bfe5f1d8> (a java.util.concurrent.ThreadPoolExecutor$Worker)

        - <0x00000004dc0160c8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)

        - <0x0000000514700ec0> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)




"RpcServer.handler=102,port=60020" daemon prio=10 tid=0x00007f6a48b6d000 nid=0x6b25 waiting
on condition [0x00007f66c986e000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000004dcd49270> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)

        at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945)

        at org.apache.hadoop.hbase.regionserver.HStore.bulkLoadHFile(HStore.java:678)

        at org.apache.hadoop.hbase.regionserver.HRegion.bulkLoadHFiles(HRegion.java:3574)

        at org.apache.hadoop.hbase.regionserver.HRegion.bulkLoadHFiles(HRegion.java:3496)

        at org.apache.hadoop.hbase.regionserver.HRegionServer.bulkLoadHFile(HRegionServer.java:3262)

        at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29499)

        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2012)

        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)

        at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)

        at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)

        at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)

        at java.lang.Thread.run(Thread.java:745)




   Locked ownable synchronizers:

        - None




"RpcServer.handler=113,port=60020" daemon prio=10 tid=0x00007f6a48b85000 nid=0x6b30 waiting
on condition [0x00007f66c8d63000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000004dcd49270> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)

        at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731)

        at org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:1754)

        at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.<init>(HRegion.java:3687)

        at org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:1872)

        at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1858)

        at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1835)

        at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3068)

        at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29497)

        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2012)

        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)

        at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)

        at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)

        at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)

        at java.lang.Thread.run(Thread.java:745)




   Locked ownable synchronizers:

        - None







Almost all RpcServer threads were in the waiting state, because the smallCompaction held the
lock <0x00000004dc908f30> and waiting for the IO to be ready. I don’t know why it
took so long a time for the IO to be ready, and the more strange thing is that I can’t even
find a SocketTimeout exception in regionserver’s log. 

Does anybody know the reason of this problem? 

Thank you in advance!




cluster version:

hbase-0.98.1-hadoop2

hadoop-2.3.0-cdh5.0.0
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message