hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enis Soztutar (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
Date Mon, 26 Mar 2012 22:52:29 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238938#comment-13238938
] 

Enis Soztutar commented on HBASE-5623:
--------------------------------------

Thanks Lars for pushing this. Just as a note, I just tested the final version of the patch
on a 4 node cluster with ycsb -threads 100. No problems.  
                
> Race condition when rolling the HLog and hlogFlush
> --------------------------------------------------
>
>                 Key: HBASE-5623
>                 URL: https://issues.apache.org/jira/browse/HBASE-5623
>             Project: HBase
>          Issue Type: Bug
>          Components: wal
>    Affects Versions: 0.94.0
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>            Priority: Critical
>             Fix For: 0.94.0
>
>         Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 5623v2.txt,
HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch
>
>
> When doing a ycsb test with a large number of handlers (regionserver.handler.count=60),
I get the following exceptions:
> {code}
> Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException
> 	at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099)
> 	at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314)
> 	at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291)
> 	at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400)
> 	at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366)
> 	at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920)
> 	at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152)
> 	at $Proxy1.multi(Unknown Source)
> 	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691)
> 	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689)
> 	at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214)
> {code}
> and 
> {code}
> 	java.lang.NullPointerException
> 		at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026)
> 		at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068)
> 		at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035)
> 		at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279)
> 		at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237)
> 		at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271)
> 		at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391)
> 		at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192)
> 		at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985)
> 		at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400)
> 		at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source)
> 		at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 		at java.lang.reflect.Method.invoke(Method.java:597)
> 		at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366)
> 		at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351)
> {code}
> It seems the root cause of the issue is that we open a new log writer and close the old
one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls
> {code} 
> logSyncerThread.hlogFlush(this.writer);
> {code}
> without holding the updateLock. LogSyncer only synchronizes against concurrent appends
and flush(), but not on the passed writer, which can be closed already by rollWriter(). In
this case, since SequenceFile#Writer.close() sets it's out field as null, we get the NPE.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message