hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephen Yuan Jiang (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-13732) TestHBaseFsck#testParallelWithRetriesHbck fails intermittently
Date Thu, 21 May 2015 00:12:59 GMT
Stephen Yuan Jiang created HBASE-13732:

             Summary: TestHBaseFsck#testParallelWithRetriesHbck fails intermittently
                 Key: HBASE-13732
                 URL: https://issues.apache.org/jira/browse/HBASE-13732
             Project: HBase
          Issue Type: Bug
          Components: hbck, test
    Affects Versions: 1.1.0, 2.0.0, 1.2.0
            Reporter: Stephen Yuan Jiang
            Assignee: Stephen Yuan Jiang
            Priority: Minor
             Fix For: 2.0.0, 1.2.0, 1.1.1

TestHBaseFsck#testParallelWithRetriesHbck failed intermittently (especially in Windows environment)
with "java.io.IOException: Duplicate hbck - Abort"

java.util.concurrent.ExecutionException: java.io.IOException: Duplicate hbck - Abort
	at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
	at java.util.concurrent.FutureTask.get(FutureTask.java:111)
	at org.apache.hadoop.hbase.util.TestHBaseFsck.testParallelWithRetriesHbck(TestHBaseFsck.java:644)
Caused by: java.io.IOException: Duplicate hbck - Abort
	at org.apache.hadoop.hbase.util.HBaseFsck.connect(HBaseFsck.java:484)
	at org.apache.hadoop.hbase.util.hbck.HbckTestingUtil.doFsck(HbckTestingUtil.java:53)
	at org.apache.hadoop.hbase.util.hbck.HbckTestingUtil.doFsck(HbckTestingUtil.java:43)
	at org.apache.hadoop.hbase.util.hbck.HbckTestingUtil.doFsck(HbckTestingUtil.java:38)
	at org.apache.hadoop.hbase.util.TestHBaseFsck$2RunHbck.call(TestHBaseFsck.java:635)
	at org.apache.hadoop.hbase.util.TestHBaseFsck$2RunHbck.call(TestHBaseFsck.java:628)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
	at java.util.concurrent.FutureTask.run(FutureTask.java:166)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
	at java.lang.Thread.run(Thread.java:722)

HBASE-13591 tried to address this issue.  It did improve the pass rate in Linux environment
(after the fix, I could not repro in my machine); however, the test still failed intermittently
in Windows environment during testing of 1.1 release.

Looking at the code, it uses the ExponentialBackoffPolicy (starting with 200ms sleep time
after first failed attempt to acquire the lock in ZK, then 400ms, then 800ms, etc.) in between
retries.  Therefore, even the first hbck run completes, the second hbck run would still fail
due to long sleep time.  

the proposal to fix the problem is to use ExponentialBackoffPolicyWithLimit and cap the max
sleep time to some small number (eg. 5 seconds, it should be configurable).  This would make
the test more robust.  

This message was sent by Atlassian JIRA

View raw message