hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14807) TestWALLockup is flakey
Date Fri, 20 Nov 2015 00:40:11 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15014877#comment-15014877
] 

stack commented on HBASE-14807:
-------------------------------

Here is what failed

Results :

Failed tests: 
  TestReplicationKillSlaveRS.killOneSlaveRS:34->TestReplicationKillRS.loadTableAndKillRS:88
Waited too much time for queueFailover replication. Waited 18444ms.

Tests run: 1721, Failures: 1, Errors: 0, Skipped: 19

Says:

-------------------------------------------------------------------------------
Test set: org.apache.hadoop.hbase.replication.TestReplicationKillSlaveRS
-------------------------------------------------------------------------------
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 111.524 sec - in org.apache.hadoop.hbase.replication.TestReplicationKillSlaveRS

Looks like this suite has two tests. The second one timed out then. Says a network error...


2015-11-19 23:15:27,701 WARN  [RS:1;asf906:60179.replicationSource.asf906.gq1.ygridcore.net%2C60179%2C1447974900067,2]
regionserver.HBaseInterClusterReplicationEndpoint(257): Can't replicate because of a local
or network error: 
org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: org.apache.hadoop.hbase.regionserver.RegionServerStoppedException:
Server asf906.gq1.ygridcore.net,41751,1447974908266 stopping

That is as far as I got.... Seems unrelated. Retrying.


> TestWALLockup is flakey
> -----------------------
>
>                 Key: HBASE-14807
>                 URL: https://issues.apache.org/jira/browse/HBASE-14807
>             Project: HBase
>          Issue Type: Bug
>          Components: flakey, test
>            Reporter: stack
>            Assignee: stack
>             Fix For: 2.0.0, 1.2.0, 1.3.0
>
>         Attachments: 14807.patch, 14807.second.attempt.txt
>
>
> Fails frequently. 
> Looks like this:
> {code}
> 2015-11-12 10:38:51,812 DEBUG [Time-limited test] regionserver.HRegion(3882): Found 0
recovered edits file(s) under /home/jenkins/jenkins-slave/workspace/HBase-1.2/jdk/latest1.7/label/Hadoop/hbase-server/target/test-data/8b8f8f12-1819-47e3-b1f1-8ffa789438ad/data/default/testLockupWhenSyncInMiddleOfZigZagSetup/c8694b53368f3301a8d370089120388d
> 2015-11-12 10:38:51,821 DEBUG [Time-limited test] regionserver.FlushLargeStoresPolicy(56):
hbase.hregion.percolumnfamilyflush.size.lower.bound is not specified, use global config(16777216)
instead
> 2015-11-12 10:38:51,880 DEBUG [Time-limited test] wal.WALSplitter(729): Wrote region
seqId=/home/jenkins/jenkins-slave/workspace/HBase-1.2/jdk/latest1.7/label/Hadoop/hbase-server/target/test-data/8b8f8f12-1819-47e3-b1f1-8ffa789438ad/data/default/testLockupWhenSyncInMiddleOfZigZagSetup/c8694b53368f3301a8d370089120388d/recovered.edits/2.seqid
to file, newSeqId=2, maxSeqId=0
> 2015-11-12 10:38:51,881 INFO  [Time-limited test] regionserver.HRegion(868): Onlined
c8694b53368f3301a8d370089120388d; next sequenceid=2
> 2015-11-12 10:38:51,994 ERROR [sync.1] wal.FSHLog$SyncRunner(1226): Error syncing, request
close of WAL
> java.io.IOException: FAKE! Failed to replace a bad datanode...SYNC
> 	at org.apache.hadoop.hbase.regionserver.TestWALLockup$1DodgyFSLog$1.sync(TestWALLockup.java:162)
> 	at org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.run(FSHLog.java:1222)
> 	at java.lang.Thread.run(Thread.java:745)
> 2015-11-12 10:38:51,997 DEBUG [Thread-4] regionserver.LogRoller(139): WAL roll requested
> 2015-11-12 10:38:52,019 DEBUG [flusher] regionserver.FlushLargeStoresPolicy(100): Since
none of the CFs were above the size, flushing all.
> 2015-11-12 10:38:52,192 INFO  [Thread-4] regionserver.TestWALLockup$1DodgyFSLog(129):
LATCHED
> java.lang.InterruptedException: sleep interrupted
> 	at java.lang.Thread.sleep(Native Method)
> 	at org.apache.hadoop.hbase.util.Threads.sleep(Threads.java:146)
> 	at org.apache.hadoop.hbase.regionserver.TestWALLockup.testLockupWhenSyncInMiddleOfZigZagSetup(TestWALLockup.java:245)
> 2015-11-12 10:39:18,609 INFO  [main] regionserver.TestWALLockup(91): Cleaning test directory:
/home/jenkins/jenkins-slave/workspace/HBase-1.2/jdk/latest1.7/label/Hadoop/hbase-server/target/test-data/8b8f8f12-1819-47e3-b1f1-8ffa789438ad
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> 	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> 	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
> 	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> 	at java.lang.Thread.run(Thread.java:745)
> {code}
> ... then times out after being locked up for 30 seconds.  Writes 50+MB of logs while
spinning.
> Reported as this:
> {code}
> -------------------------------------------------------------------------------
> Test set: org.apache.hadoop.hbase.regionserver.TestWALLockup
> -------------------------------------------------------------------------------
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 198.23 sec <<<
FAILURE! - in org.apache.hadoop.hbase.regionserver.TestWALLockup
> testLockupWhenSyncInMiddleOfZigZagSetup(org.apache.hadoop.hbase.regionserver.TestWALLockup)
 Time elapsed: 0.049 sec  <<< ERROR!
> org.junit.runners.model.TestTimedOutException: test timed out after 30000 milliseconds
> 	at org.apache.log4j.Category.callAppenders(Category.java:205)
> 	at org.apache.log4j.Category.forcedLog(Category.java:391)
> 	at org.apache.log4j.Category.log(Category.java:856)
> 	at org.apache.commons.logging.impl.Log4JLogger.debug(Log4JLogger.java:155)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1386)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1352)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1302)
> 	at org.apache.hadoop.hbase.regionserver.TestWALLockup.testLockupWhenSyncInMiddleOfZigZagSetup(TestWALLockup.java:260)
> {code}
> Failed here most recently:
> https://builds.apache.org/view/H-L/view/HBase/job/HBase-1.2/364/jdk=latest1.7,label=Hadoop/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message