hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Carol Pearson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-12074) TestLogRollingNoCluster#testContendedLogRolling() failed
Date Fri, 23 Sep 2016 23:22:20 GMT

    [ https://issues.apache.org/jira/browse/HBASE-12074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15517857#comment-15517857
] 

Carol Pearson commented on HBASE-12074:
---------------------------------------

I've recently encountered this bug as well when using Trafodion with a large table load (5.5
billion rows) and HBase 1.0.0-cdh5.4.5.

2016-09-22 05:20:03,211 INFO org.apache.hadoop.hbase.regionserver.HRegion: Started memstore
flush for TRAFODION.JAVABENCH.OE_ORDERLINE_18\
432,\x00\x00\x00*\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00,1474490067039.f1ea6356c0c99c7d7ad1a531e003d9cd.,
curren\
t region memstore size 516.50 MB, and 1/2 column families' memstores are being flushed.
2016-09-22 05:20:03,211 INFO org.apache.hadoop.hbase.regionserver.HRegion: Flushing Column
Family: 0000001 which was occupying 516.73 MB of me\
mstore.
2016-09-22 05:20:04,494 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost:
181 ms, current pipeline: [DatanodeInfoWith\
Storage[172.31.58.52:50010,DS-2faa071d-835d-405c-9246-6c43dd71ddb4,DISK], DatanodeInfoWithStorage[172.31.54.58:50010,DS-f7da0d2e-4b9c-4da\
d-99ed-0009345f9410,DISK]]
2016-09-22 05:20:06,318 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost:
153 ms, current pipeline: [DatanodeInfoWith\
Storage[172.31.58.52:50010,DS-2faa071d-835d-405c-9246-6c43dd71ddb4,DISK], DatanodeInfoWithStorage[172.31.54.58:50010,DS-f7da0d2e-4b9c-4da\
d-99ed-0009345f9410,DISK]]
2016-09-22 05:20:07,154 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost:
149 ms, current pipeline: [DatanodeInfoWith\
Storage[172.31.58.52:50010,DS-2faa071d-835d-405c-9246-6c43dd71ddb4,DISK], DatanodeInfoWithStorage[172.31.54.58:50010,DS-f7da0d2e-4b9c-4da\
d-99ed-0009345f9410,DISK]]
2016-09-22 05:20:07,860 ERROR org.apache.hadoop.hbase.regionserver.wal.FSHLog: Error syncing,
request close of wal
java.io.IOException: java.lang.NullPointerException
        at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:176)
        at org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.run(FSHLog.java:1334)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
        at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:173)
        ... 2 more
2016-09-22 05:20:07,869 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: Rolled WAL /hbase/WALs/isleroyale03.cluster.local,60020,147\
4421718304/isleroyale03.cluster.local%2C60020%2C1474421718304.null0.1474521602486 with entries=2020,
filesize=123.71 MB; new WAL /hbase/W\
ALs/isleroyale03.cluster.local,60020,1474421718304/isleroyale03.cluster.local%2C60020%2C1474421718304.null0.1474521607715
2016-09-22 05:20:07,870 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING
region server isleroyale03.cluster.local,60020\
,1474421718304: IOE in log roller
java.io.IOException: java.lang.NullPointerException
        at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:176)
        at org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.run(FSHLog.java:1334)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
        at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:173)
        ... 2 more



> TestLogRollingNoCluster#testContendedLogRolling() failed
> --------------------------------------------------------
>
>                 Key: HBASE-12074
>                 URL: https://issues.apache.org/jira/browse/HBASE-12074
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Enis Soztutar
>            Assignee: Stephen Yuan Jiang
>
> TestLogRollingNoCluster#testContendedLogRolling() failed on a 0.98 run. I am trying to
understand the context. 
> The failure is this: 
> {code}
> java.lang.AssertionError
> 	at org.junit.Assert.fail(Assert.java:86)
> 	at org.junit.Assert.assertTrue(Assert.java:41)
> 	at org.junit.Assert.assertFalse(Assert.java:64)
> 	at org.junit.Assert.assertFalse(Assert.java:74)
> 	at org.apache.hadoop.hbase.regionserver.wal.TestLogRollingNoCluster.testContendedLogRolling(TestLogRollingNoCluster.java:80)
> {code}
> Caused because one of the Appenders calling FSHLog.sync() threw IOE because of concurrent
close: 
> {code}
> 4-09-23 16:36:39,530 FATAL [pool-1-thread-1-WAL.AsyncSyncer0] wal.FSHLog$AsyncSyncer(1246):
Error while AsyncSyncer sync, request close of hlog 
> java.io.IOException: java.lang.NullPointerException
> 	at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:168)
> 	at org.apache.hadoop.hbase.regionserver.wal.FSHLog$AsyncSyncer.run(FSHLog.java:1241)
> 	at java.lang.Thread.run(Thread.java:722)
> Caused by: java.lang.NullPointerException
> 	at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:165)
> 	... 2 more
> 2014-09-23 16:36:39,531 INFO  [32] wal.TestLogRollingNoCluster$Appender(137): Caught
exception from Appender:32
> java.io.IOException: java.lang.NullPointerException
> 	at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:168)
> 	at org.apache.hadoop.hbase.regionserver.wal.FSHLog$AsyncSyncer.run(FSHLog.java:1241)
> 	at java.lang.Thread.run(Thread.java:722)
> Caused by: java.lang.NullPointerException
> 	at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:165)
> 	... 2 more
> 2014-09-23 16:36:39,532 INFO  [19] wal.TestLogRollingNoCluster$Appender(137): Caught
exception from Appender:19
> java.io.IOException: java.lang.NullPointerException
> 	at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:168)
> 	at org.apache.hadoop.hbase.regionserver.wal.FSHLog$AsyncSyncer.run(FSHLog.java:1241)
> 	at java.lang.Thread.run(Thread.java:722)
> Caused by: java.lang.NullPointerException
> 	at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:165)
> 	... 2 more
> {code}
> The code is: 
> {code}
>   public void sync() throws IOException {
>     try {
>       this.output.flush();
>       this.output.sync();
>     } catch (NullPointerException npe) {
>       // Concurrent close...
>       throw new IOException(npe);
>     }
>   }
> {code}
> I think the test case written exactly to catch this case: 
> {code}
>    * Spin up a bunch of threads and have them all append to a WAL.  Roll the
>    * WAL frequently to try and trigger NPE.
> {code}
> This is why I am reporting since I don't have much context. It may not be a test issue,
but an actual bug. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message