hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Atri Sharma (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-12074) TestLogRollingNoCluster#testContendedLogRolling() failed
Date Thu, 13 Oct 2016 06:45:20 GMT

    [ https://issues.apache.org/jira/browse/HBASE-12074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15571053#comment-15571053
] 

Atri Sharma commented on HBASE-12074:
-------------------------------------

Could a possible fix be to make rollWriter get the zig-zag latch and call doReplaceWriter
as the first operation, before attempting to close and flush the log files? This will lead
new HLog Writer threads to see the newPath already set and not wait for the flush to happen,
and the old file cleanup can happen as a background thread.

> TestLogRollingNoCluster#testContendedLogRolling() failed
> --------------------------------------------------------
>
>                 Key: HBASE-12074
>                 URL: https://issues.apache.org/jira/browse/HBASE-12074
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Enis Soztutar
>            Assignee: Stephen Yuan Jiang
>
> TestLogRollingNoCluster#testContendedLogRolling() failed on a 0.98 run. I am trying to
understand the context. 
> The failure is this: 
> {code}
> java.lang.AssertionError
> 	at org.junit.Assert.fail(Assert.java:86)
> 	at org.junit.Assert.assertTrue(Assert.java:41)
> 	at org.junit.Assert.assertFalse(Assert.java:64)
> 	at org.junit.Assert.assertFalse(Assert.java:74)
> 	at org.apache.hadoop.hbase.regionserver.wal.TestLogRollingNoCluster.testContendedLogRolling(TestLogRollingNoCluster.java:80)
> {code}
> Caused because one of the Appenders calling FSHLog.sync() threw IOE because of concurrent
close: 
> {code}
> 4-09-23 16:36:39,530 FATAL [pool-1-thread-1-WAL.AsyncSyncer0] wal.FSHLog$AsyncSyncer(1246):
Error while AsyncSyncer sync, request close of hlog 
> java.io.IOException: java.lang.NullPointerException
> 	at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:168)
> 	at org.apache.hadoop.hbase.regionserver.wal.FSHLog$AsyncSyncer.run(FSHLog.java:1241)
> 	at java.lang.Thread.run(Thread.java:722)
> Caused by: java.lang.NullPointerException
> 	at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:165)
> 	... 2 more
> 2014-09-23 16:36:39,531 INFO  [32] wal.TestLogRollingNoCluster$Appender(137): Caught
exception from Appender:32
> java.io.IOException: java.lang.NullPointerException
> 	at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:168)
> 	at org.apache.hadoop.hbase.regionserver.wal.FSHLog$AsyncSyncer.run(FSHLog.java:1241)
> 	at java.lang.Thread.run(Thread.java:722)
> Caused by: java.lang.NullPointerException
> 	at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:165)
> 	... 2 more
> 2014-09-23 16:36:39,532 INFO  [19] wal.TestLogRollingNoCluster$Appender(137): Caught
exception from Appender:19
> java.io.IOException: java.lang.NullPointerException
> 	at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:168)
> 	at org.apache.hadoop.hbase.regionserver.wal.FSHLog$AsyncSyncer.run(FSHLog.java:1241)
> 	at java.lang.Thread.run(Thread.java:722)
> Caused by: java.lang.NullPointerException
> 	at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:165)
> 	... 2 more
> {code}
> The code is: 
> {code}
>   public void sync() throws IOException {
>     try {
>       this.output.flush();
>       this.output.sync();
>     } catch (NullPointerException npe) {
>       // Concurrent close...
>       throw new IOException(npe);
>     }
>   }
> {code}
> I think the test case written exactly to catch this case: 
> {code}
>    * Spin up a bunch of threads and have them all append to a WAL.  Roll the
>    * WAL frequently to try and trigger NPE.
> {code}
> This is why I am reporting since I don't have much context. It may not be a test issue,
but an actual bug. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message