hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jingcheng Du (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HBASE-15959) Fix flaky test TestRegionServerMetrics.testMobMetrics
Date Wed, 08 Jun 2016 07:55:21 GMT

    [ https://issues.apache.org/jira/browse/HBASE-15959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15320168#comment-15320168
] 

Jingcheng Du edited comment on HBASE-15959 at 6/8/16 7:55 AM:
--------------------------------------------------------------

Thanks [~appy], you are right. The patch only fixes the issue in this test but not fix the
uncovered bug.
bq. I don't know much about MOB, but from the looks, it seems that the test has uncovered
a bug here
What I gave in the example is not only for mob, it is what the compactor behaves in the master
branch for all kinds of compactions.
Now in the master code, all (not only for mob) compactors won't archive the old/compacted
store files, instead a region server chore does it. But this cleaning is not done when the
region is close. I guess the WAL can help redo this when the region starts, but in the test
I didn't see this, The chaos is there after the region is restarted ( I did this by disabling/enabling
tables).
Fortunately the scan results will be right in both non-mob and mob tables, but metrics in
mob tables might be wrong because of the chaos. I am looking into why the cleaning is not
done before close.


was (Author: jingcheng.du@intel.com):
Thanks [~appy], you are right. The patch only fixes the issue in this test but not fix the
uncovered bug.
bq. I don't know much about MOB, but from the looks, it seems that the test has uncovered
a bug here
What I gave in the example is not only for mob, it is what the compactor behaves in the master
branch for all kinds of compactions.
Now in the master code, all (not only for mob) compactors won't archive the old/compacted
store files, instead a region server chore does it. But this cleaning is not done when the
region is close. I guess the WAL can help redo this when the region starts, but in the test
I didn't see this, The chaos is there after the region is restarted ( I did this by disabling/enabling
tables).
Fortunately the scan results will be right in both non-mob and mob tables, but metrics in
mob tables might be wrong because of the chaos. I am looking into why the cleaning is not
done after restarting.

> Fix flaky test TestRegionServerMetrics.testMobMetrics
> -----------------------------------------------------
>
>                 Key: HBASE-15959
>                 URL: https://issues.apache.org/jira/browse/HBASE-15959
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Appy
>            Assignee: huaxiang sun
>         Attachments: HBASE-15959-v001.patch, HBASE-15959-v002.patch, HBASE-15959-v003.patch
>
>
> It flakes [here|https://github.com/apache/hbase/blob/b557f0bec62a48753e5d01d7a47f3c9e5a6b3ee8/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerMetrics.java#L460].
> There are two weird things i identified:
> 1. In second compaction, [scanner|https://github.com/apache/hbase/blob/b557f0bec62a48753e5d01d7a47f3c9e5a6b3ee8/hbase-server/src/main/java/org/apache/hadoop/hbase/mob/DefaultMobStoreCompactor.java#L173]
has 10 storefiles. Shouldn't there be 6? 5 from recent flushes and 1 from earlier compaction.
Probably because mob cleaner doesn't clean old hfiles. Does this needs fixing?
> 2. Across runs, same cell (ie. same key) may or may not be considered mob reference cell.
[here|https://github.com/apache/hbase/blob/b557f0bec62a48753e5d01d7a47f3c9e5a6b3ee8/hbase-server/src/main/java/org/apache/hadoop/hbase/mob/DefaultMobStoreCompactor.java#L213].
This at least happens with row keys 0 - 4 (which got compacted earlier). [~jmhsieh] Any ideas
why this would happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message