hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephen Yuan Jiang (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-13831) TestHBaseFsck#testParallelHbck is flaky
Date Wed, 03 Jun 2015 19:00:39 GMT

     [ https://issues.apache.org/jira/browse/HBASE-13831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Stephen Yuan Jiang updated HBASE-13831:
---------------------------------------
    Fix Version/s: 1.1.1
                   1.2.0
                   2.0.0
           Status: Patch Available  (was: Open)

> TestHBaseFsck#testParallelHbck is flaky
> ---------------------------------------
>
>                 Key: HBASE-13831
>                 URL: https://issues.apache.org/jira/browse/HBASE-13831
>             Project: HBase
>          Issue Type: Bug
>          Components: hbck, test
>    Affects Versions: 1.1.0, 2.0.0, 1.2.0
>            Reporter: Stephen Yuan Jiang
>            Assignee: Stephen Yuan Jiang
>            Priority: Minor
>             Fix For: 2.0.0, 1.2.0, 1.1.1
>
>         Attachments: HBASE-13831.patch
>
>
> Running TestHBaseFsck#testParallelHbck is flaky against HADOOP-2.6+ environment.  The
idea of the test is that with when 2 HBCK operations are running simultaneously, the 2nd HBCK
would fail with no-retry because creating lock file would fail due to the 1st HBCK already
created.  However, with HADOOP-2.6+, the FileSystem#createFile call internally retries with
AlreadyBeingCreatedException (see HBASE-13574 for more details: "It seems that test is broken
due of the new create retry policy in hadoop 2.6. 
> Namenode proxy now created with custom RetryPolicy for AlreadyBeingCreatedException which
is implies timeout on this operations up to HdfsConstants.LEASE_SOFTLIMIT_PERIOD (60seconds).")
> When I run the TestHBaseFsck#testParallelHbck test against HADOOP-2.7 in a Windows environment
(HBASE is branch-1.1) multiple times, the result is unpredictable (sometime succeeded, sometime
failed - more failure than succeeded).  
> The fix is trivial, to leverage the change in HBASE-13732 and reduce the max wait time
to a smaller number.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message