hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anoop Sam John (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-7728) deadlock occurs between hlog roller and hlog syncer
Date Thu, 31 Jan 2013 10:01:15 GMT

    [ https://issues.apache.org/jira/browse/HBASE-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13567496#comment-13567496
] 

Anoop Sam John commented on HBASE-7728:
---------------------------------------

LogRoller thread trying to do a rolling over current log file. It captured the updateLock
already.
{code}
HLog#rollWriter(boolean force)
synchronized (updateLock) {
        // Clean up current writer.
        Path oldFile = cleanupCurrentWriter(currentFilenum);
        this.writer = nextWriter;
		....
}
{code}
As part of the clean up current writer, this thread try to sync the pending writes
{code}
HLog#cleanupCurrentWriter(){
....
	sync();
    }
    this.writer.close();
}
{code}
At the same time logSyncer thread was doing a defered log sync operation
{code}
HLog#syncer(long txid){
 ...
 synchronized (flushLock) {
	....
	try {
	  logSyncerThread.hlogFlush(tempWriter, pending);
	} catch(IOException io) {
	  synchronized (this.updateLock) {
		// HBASE-4387, HBASE-5623, retry with updateLock held
		tempWriter = this.writer;
		logSyncerThread.hlogFlush(tempWriter, pending);
	  }
	}
}
{code}
This thread trying to grab the updateLock and holding the flushLock. Same time the roller
thread coming and as part of clean up sync it tries to grab flushLock.
IOException might have happened in the logSyncer thread(logSyncerThread.hlogFlush). At this
time our assumption is a log rollover already happened. That is why we try to write again
with updateLock held and getting the writer again. [The writer on which the IOE happened should
have closed.]

In roller thread the writer close happens after the cleanup operation.
So I guess logSyncerThread.hlogFlush thrown IOE not because of a log roll.
With out assuming the log roll in catch block we can check for tempWriter == this.writer;
??

I am not an expert in this area. As per a quick code study adding my observation. If wrong
pls correct me.  Any logs with you when this happened?
                
> deadlock occurs between hlog roller and hlog syncer
> ---------------------------------------------------
>
>                 Key: HBASE-7728
>                 URL: https://issues.apache.org/jira/browse/HBASE-7728
>             Project: HBase
>          Issue Type: Bug
>          Components: wal
>    Affects Versions: 0.94.2
>         Environment: Linux 2.6.18-164.el5 x86_64 GNU/Linux
>            Reporter: Wang Qiang
>            Priority: Blocker
>
> the hlog roller thread and hlog syncer thread may occur dead lock with the 'flushLock'
and 'updateLock', and then cause all 'IPC Server handler' thread blocked on hlog append. the
jstack info is as follow :
> "regionserver60020.logRoller":
>         at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1305)
>         - waiting to lock <0x000000067bf88d58> (a java.lang.Object)
>         at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1283)
>         at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1456)
>         at org.apache.hadoop.hbase.regionserver.wal.HLog.cleanupCurrentWriter(HLog.java:876)
>         at org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:657)
>         - locked <0x000000067d54ace0> (a java.lang.Object)
>         at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94)
>         at java.lang.Thread.run(Thread.java:662)
> "regionserver60020.logSyncer":
>         at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1314)
>         - waiting to lock <0x000000067d54ace0> (a java.lang.Object)
>         - locked <0x000000067bf88d58> (a java.lang.Object)
>         at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1283)
>         at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1456)
>         at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1235)
>         at java.lang.Thread.run(Thread.java:662)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message