hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enis Soztutar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16824) Writer.flush() can be called on already closed streams in WAL roll
Date Mon, 17 Oct 2016 22:23:59 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15583685#comment-15583685
] 

Enis Soztutar commented on HBASE-16824:
---------------------------------------

There is a deadlock happening with this it seems: 
Some threads are like this: 
{code}
"22" #222 daemon prio=5 os_prio=31 tid=0x00007fd2063e1800 nid=0x19403 in Object.wait() [0x000070000f373000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	at org.apache.hadoop.hbase.regionserver.wal.SyncFuture.get(SyncFuture.java:159)
	- locked <0x00000006c45563d8> (a org.apache.hadoop.hbase.regionserver.wal.SyncFuture)
	at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.blockOnSync(AbstractFSWAL.java:641)
	at org.apache.hadoop.hbase.regionserver.wal.FSHLog.publishSyncThenBlockOnCompletion(FSHLog.java:765)
	at org.apache.hadoop.hbase.regionserver.wal.FSHLog.sync(FSHLog.java:807)
	at org.apache.hadoop.hbase.regionserver.wal.TestLogRollingNoCluster$Appender.run(TestLogRollingNoCluster.java:168)
{code}

Others: 
{code}
"21" #221 daemon prio=5 os_prio=31 tid=0x00007fd205fe7800 nid=0x19203 waiting on condition
[0x000070000f270000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000006c442d150> (a java.util.concurrent.locks.ReentrantLock$FairSync)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
	at java.util.concurrent.locks.ReentrantLock$FairSync.lock(ReentrantLock.java:224)
	at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
	at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:664)
	at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:426)
	at org.apache.hadoop.hbase.regionserver.wal.TestLogRollingNoCluster$Appender.run(TestLogRollingNoCluster.java:153)
{code}

Syncers:
{code}
"sync.4" #198 daemon prio=5 os_prio=31 tid=0x00007fd20a231800 nid=0x16603 waiting on condition
[0x000070000dc2e000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000006c4598028> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
	at org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.run(FSHLog.java:609)
	at java.lang.Thread.run(Thread.java:745)
{code}

and the RBEH: 
{code}
"Time-limited test.append-pool1-t1" #199 daemon prio=5 os_prio=31 tid=0x00007fd207f54000 nid=0x15c03
in Object.wait() [0x000070000d71f000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	at java.lang.Object.wait(Object.java:460)
	at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.attainSafePoint(FSHLog.java:1129)
	- locked <0x00000006c45af270> (a java.lang.Object)
	at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1095)
	at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:946)
	at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:128)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
{code}

Trying to understand how come this happens. Will report back. 

> Writer.flush() can be called on already closed streams in WAL roll
> ------------------------------------------------------------------
>
>                 Key: HBASE-16824
>                 URL: https://issues.apache.org/jira/browse/HBASE-16824
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Atri Sharma
>            Assignee: Enis Soztutar
>         Attachments: hbase-16824_v1.patch
>
>
> In https://issues.apache.org/jira/browse/HBASE-12074, we hit an error if an async thread
calls flush on a WAL record already closed as the WAL is being rotated. This JIRA investigates
if setting the new WAL record path as the first operation during WAL rotation will fix the
issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message