hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
Date Fri, 23 Mar 2012 23:29:27 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237293#comment-13237293
] 

Lars Hofhansl commented on HBASE-5623:
--------------------------------------

bq. holding the updateLock does not guarantee that the other thread's writer pointer is updated
to the nextWriter

Are you sure about this? This seems to be the main objective of the updateLock. I don't see
any spot where we change this.writer without the updateLock held.
AtomicReference should not be needed (IMHO). Also not a big fan of catching NPE, in your scenario
it also should not be needed (although I could be mistaken).

OK... Lemme do one: I'll integrate my fixed up patch without your new test. I'll run locally
for a while. If it's fine I'll post the patch here and you can poke holes in it. Sounds fair?
                
> Race condition when rolling the HLog and hlogFlush
> --------------------------------------------------
>
>                 Key: HBASE-5623
>                 URL: https://issues.apache.org/jira/browse/HBASE-5623
>             Project: HBase
>          Issue Type: Bug
>          Components: wal
>    Affects Versions: 0.94.0
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>            Priority: Critical
>             Fix For: 0.94.0
>
>         Attachments: 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch,
HBASE-5623_v5.patch
>
>
> When doing a ycsb test with a large number of handlers (regionserver.handler.count=60),
I get the following exceptions:
> {code}
> Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException
> 	at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099)
> 	at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314)
> 	at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291)
> 	at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400)
> 	at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366)
> 	at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920)
> 	at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152)
> 	at $Proxy1.multi(Unknown Source)
> 	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691)
> 	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689)
> 	at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214)
> {code}
> and 
> {code}
> 	java.lang.NullPointerException
> 		at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026)
> 		at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068)
> 		at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035)
> 		at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279)
> 		at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237)
> 		at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271)
> 		at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391)
> 		at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192)
> 		at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985)
> 		at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400)
> 		at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source)
> 		at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 		at java.lang.reflect.Method.invoke(Method.java:597)
> 		at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366)
> 		at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351)
> {code}
> It seems the root cause of the issue is that we open a new log writer and close the old
one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls
> {code} 
> logSyncerThread.hlogFlush(this.writer);
> {code}
> without holding the updateLock. LogSyncer only synchronizes against concurrent appends
and flush(), but not on the passed writer, which can be closed already by rollWriter(). In
this case, since SequenceFile#Writer.close() sets it's out field as null, we get the NPE.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message