hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gaojinchao <gaojinc...@huawei.com>
Subject re: TestMasterFailover#testMasterFailoverWithMockedRITOnDeadRS fails on Jenkins
Date Fri, 04 Nov 2011 06:45:36 GMT
I can reproduce it:

-------------------------------------------------------
 T E S T S
-------------------------------------------------------

-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running org.apache.hadoop.hbase.master.TestMasterFailover
Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 90.954 sec <<< FAILURE!

Results :

Failed tests:   testMasterFailoverWithMockedRITOnDeadRS(org.apache.hadoop.hbase.master.TestMasterFailover):
region=enabledTable,bbb,1319241846089.6b022df3f7399ee977683c6c5e4be009.

Tests run: 4, Failures: 1, Errors: 0, Skipped: 0

-----邮件原件-----
发件人: Ted Yu [mailto:yuzhihong@gmail.com] 
发送时间: 2011年11月4日 13:21
收件人: dev@hbase.apache.org; lars hofhansl
主题: Re: TestMasterFailover#testMasterFailoverWithMockedRITOnDeadRS fails on Jenkins

Please run the test in loop.

I can reproduce the failure on my MacBook.

Gary logged a jira about jmx exceptions. They're non-essential.

Cheers

On Thursday, November 3, 2011, lars hofhansl <lhofhansl@yahoo.com> wrote:
> When I run that locally (latest trunk) it passes:
>
> -------------------------------------------------------
>  T E S T S
> -------------------------------------------------------
> Running org.apache.hadoop.hbase.master.TestMasterFailover
> Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 69.721 sec
>
> Results :
>
> Tests run: 4, Failures: 0, Errors: 0, Skipped: 0
>
> [INFO]
------------------------------------------------------------------------
> [INFO] BUILD SUCCESSFUL
> [INFO]
------------------------------------------------------------------------
> [INFO] Total time: 2 minutes 29 seconds
> [INFO] Finished at: Thu Nov 03 22:06:25 PDT 2011
> [INFO] Final Memory: 58M/286M
> [INFO]
------------------------------------------------------------------------
>
>
> In the log I see some JMX related exceptions, but their timing did not
> suggest any potentially hanging threads.
>
> (Linux, OpenJDK 1.6 64 bit, needed to set umask to 022)
>
>
> -- Lars
>
>
>
> ----- Original Message -----
> From: Ted Yu <yuzhihong@gmail.com>
> To: dev@hbase.apache.org
> Cc:
> Sent: Thursday, November 3, 2011 8:55 PM
> Subject: TestMasterFailover#testMasterFailoverWithMockedRITOnDeadRS fails
on Jenkins
>
> Hi,
> Currently TestMasterFailover#testMasterFailoverWithMockedRITOnDeadRS <
>
https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92/105/testReport/org.apache.hadoop.hbase.master/TestMasterFailover/testMasterFailoverWithMockedRITOnDeadRS/
<
https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master/TestMasterFailover/testMasterFailoverWithMockedRITOnDeadRS/
>>
> consistently fails on 0.92 and TRUNK.
>
> I intended to log a JIRA but https://issues.apache.org is giving me 503
> error.
>
> I briefly went over the code.
> I think after each region is added to regionsThatShouldBeOnline, we should
> log the name of region:
>     // Region of enabled on dead server gets closed but not ack'd by
master
>     region = enabledAndOnDeadRegions.remove(0);
>     regionsThatShouldBeOnline.add(region);
>     log("2. expecting " + region.toString() + " to be online: ");
>
> so that if the assertion below fails we know what type of scenario wasn't
> working:
>     for (HRegionInfo hri : regionsThatShouldBeOnline) {
>       assertTrue("region=" + hri.getRegionNameAsString(),
> onlineRegions.contains(hri));
>     }
>
> From the above mentioned test output I saw a lot of:
>
> 2011-11-03 21:52:58,652 FATAL [Thread-558.logSyncer] wal.HLog(1106):
> Could not sync. Requesting close of hlog
> java.io.IOException: Reflection
>     at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:225)
>     at
org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1090)
>     at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1194)
>     at
org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1056)
>     at java.lang.Thread.run(Thread.java:662)
> Caused by: java.lang.reflect.InvocationTargetException
>     at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
>     at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:223)
>     ... 4 more
> Caused by: java.io.IOException: DFSOutputStream is closed
>     at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3483)
>     at
org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97)
>     at
org.apache.hadoop.io.SequenceFile$Writer.syncFs(SequenceFile.java:944)
>     ... 8 more
>
> Maybe they have something to do with regions stuck in RIT.
>
> Cheers
>
>
Mime
View raw message