hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephen Yuan Jiang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13732) TestHBaseFsck#testParallelWithRetriesHbck fails intermittently
Date Tue, 02 Jun 2015 19:34:50 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14569619#comment-14569619

Stephen Yuan Jiang commented on HBASE-13732:

[~enis] The patch you put in branch-1 and branch-1.1 is different from the patch in this JIRA
- it missed one place and left one magic number unchanged - the one in master branch is correct.

> TestHBaseFsck#testParallelWithRetriesHbck fails intermittently
> --------------------------------------------------------------
>                 Key: HBASE-13732
>                 URL: https://issues.apache.org/jira/browse/HBASE-13732
>             Project: HBase
>          Issue Type: Bug
>          Components: hbck, test
>    Affects Versions: 2.0.0, 1.1.0, 1.2.0
>            Reporter: Stephen Yuan Jiang
>            Assignee: Stephen Yuan Jiang
>            Priority: Minor
>             Fix For: 2.0.0, 1.2.0, 1.1.1
>         Attachments: HBASE-13732.patch
> TestHBaseFsck#testParallelWithRetriesHbck failed intermittently (especially in Windows
environment) with "java.io.IOException: Duplicate hbck - Abort"
> {noformat}
> java.util.concurrent.ExecutionException: java.io.IOException: Duplicate hbck - Abort
> 	at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
> 	at java.util.concurrent.FutureTask.get(FutureTask.java:111)
> 	at org.apache.hadoop.hbase.util.TestHBaseFsck.testParallelWithRetriesHbck(TestHBaseFsck.java:644)
> Caused by: java.io.IOException: Duplicate hbck - Abort
> 	at org.apache.hadoop.hbase.util.HBaseFsck.connect(HBaseFsck.java:484)
> 	at org.apache.hadoop.hbase.util.hbck.HbckTestingUtil.doFsck(HbckTestingUtil.java:53)
> 	at org.apache.hadoop.hbase.util.hbck.HbckTestingUtil.doFsck(HbckTestingUtil.java:43)
> 	at org.apache.hadoop.hbase.util.hbck.HbckTestingUtil.doFsck(HbckTestingUtil.java:38)
> 	at org.apache.hadoop.hbase.util.TestHBaseFsck$2RunHbck.call(TestHBaseFsck.java:635)
> 	at org.apache.hadoop.hbase.util.TestHBaseFsck$2RunHbck.call(TestHBaseFsck.java:628)
> 	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> 	at java.lang.Thread.run(Thread.java:722)
> {noformat}
> HBASE-13591 tried to address this issue.  It did improve the pass rate in Linux environment
(after the fix, I could not repro in my machine); however, the test still failed intermittently
in Windows environment during testing of 1.1 release.
> Looking at the code, it uses the ExponentialBackoffPolicy (starting with 200ms sleep
time after first failed attempt to acquire the lock in ZK, then 400ms, then 800ms, etc.) in
between retries.  Therefore, even the first hbck run completes, the second hbck run would
still fail due to long sleep time.  
> the proposal to fix the problem is to use ExponentialBackoffPolicyWithLimit and cap the
max sleep time to some small number (eg. 5 seconds, it should be configurable).  This would
make the test more robust.  

This message was sent by Atlassian JIRA

View raw message