hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-18366) Fix flaky test hbase.master.procedure.TestServerCrashProcedure#testRecoveryAndDoubleExecutionOnRsWithMeta
Date Fri, 14 Jul 2017 21:36:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-18366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16088121#comment-16088121

stack commented on HBASE-18366:

bq. Initially all RS are at the same version i.e. 3.0.0-SNAPSHOT. HMaster.getRegionServerVersion()
returns version 0.0.0 for dead RS (carrying meta)....MoveRegionProcedure to move meta region
from RS with version 0.0.0 to one of other RS with latest version.

This is good. We have double the procedures working on reassign now.

bq. I found that server can be online and dead at the same time!

Good one [~uagashe] This is a 'hole'.

On the change, it looks good to me. I wonder though how something went into the serverdead
list w/o being pulled from the online list. That seems like a backdoor we want to close.

I can disable for now but will not resolve this issue. I like pulling the checkIfShouldMoveSystemRegionAsync
logic handling into your new procedure, HBASE-18261. Would be good to figure why addition
to dead list does not get a server purged from the online list? Because it has not been processed
yet by crash procedure?  How did it get into dead list then?

Thanks [~uagashe]

> Fix flaky test hbase.master.procedure.TestServerCrashProcedure#testRecoveryAndDoubleExecutionOnRsWithMeta
> ---------------------------------------------------------------------------------------------------------
>                 Key: HBASE-18366
>                 URL: https://issues.apache.org/jira/browse/HBASE-18366
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Umesh Agashe
>            Assignee: Umesh Agashe
> It worked for a few days after enabling it with HBASE-18278. But started failing after
> 6786b2b
> 68436c9
> 75d2eca
> 50bb045
> df93c13
> It works with one commit before: c5abb6c. Need to see what changed with those commits.
> Currently it fails with TableNotFoundException.

This message was sent by Atlassian JIRA

View raw message