hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Yu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-15192) TestRegionMergeTransactionOnCluster#testCleanMergeReference is flaky
Date Mon, 01 Feb 2016 17:25:39 GMT

    [ https://issues.apache.org/jira/browse/HBASE-15192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15126599#comment-15126599
] 

Ted Yu commented on HBASE-15192:
--------------------------------

Since the test fails if merge references are not cleaned, we can call admin.runCatalogScan()
more than once if needed.

runCatalogScan() is the only method exposed by CatalogJanitor, otherwise we can poll CatalogJanitor
for the value of mergeCleaned and pass the test when mergeCleaned crosses 1.

Patch v2 passes 30 iterations of test runs. Previously the test failed within the first 5
iterations.

> TestRegionMergeTransactionOnCluster#testCleanMergeReference is flaky
> --------------------------------------------------------------------
>
>                 Key: HBASE-15192
>                 URL: https://issues.apache.org/jira/browse/HBASE-15192
>             Project: HBase
>          Issue Type: Test
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>            Priority: Minor
>         Attachments: HBASE-15192.v1.patch
>
>
> TestRegionMergeTransactionOnCluster#testCleanMergeReference fails intermittently due
to failed assertion on cleaned merge region count:
> {code}
> testCleanMergeReference(org.apache.hadoop.hbase.regionserver.TestRegionMergeTransactionOnCluster)
 Time elapsed: 64.183 sec  <<< FAILURE!
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at org.apache.hadoop.hbase.regionserver.TestRegionMergeTransactionOnCluster.testCleanMergeReference(TestRegionMergeTransactionOnCluster.java:284)
> {code}
> Before calling CatalogJanitor#scan(), the test does:
> {code}
>       int newcount1 = 0;
>       while (System.currentTimeMillis() < timeout) {
>         for(HColumnDescriptor colFamily : columnFamilies) {
>           newcount1 += hrfs.getStoreFiles(colFamily.getName()).size();
>         }
>         if(newcount1 <= 1) {
>           break;
>         }
>         Thread.sleep(50);
>       }
> {code}
> newcount1 is not cleared at the beginning of the loop.
> This means that if the check for newcount1 <= 1 doesn't pass the first iteration,
it wouldn't pass in subsequent iterations.
> After timeout is exhausted, admin.runCatalogScan() is called. However, there is a chance
that CatalogJanitor#scan() has been called by the Chore already (during the wait period),
leaving the cleaned count 0 and failing the test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message