hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Umesh Agashe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-18366) Fix flaky test hbase.master.procedure.TestServerCrashProcedure#testRecoveryAndDoubleExecutionOnRsWithMeta
Date Tue, 01 Aug 2017 18:33:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-18366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16109472#comment-16109472

Umesh Agashe commented on HBASE-18366:

bq. Why this change sir: optional ServerName destination_server = 3;

destination_server when not specified, LoadBalancer will select it. It can be used when region
is required to be moved from RS but target server will be selected by load balancer.

[~stack] and I had discussion on this JIRA, unit test TestServerCrashProcedure and patched
uploaded here. We identified following areas for improving code/ fixing bugs:

* Currently UnassignProcedure returns success when server carrying a region is not online.
Assumption here is that ServerCrashProcedure will handle splitting logs etc for these regions.
When UnassignProcedure completes, MoveRegionProcedure resumes with AssignProcedure. AssignProcedure
can assign region without pre-requisite steps. Fix is to fail UnassignProcedure and parent
MoveRegionProcedure if source server is not online.
* Embed logic of selecting highest versioned region server for system table regions in AssignmentManager.processAssignQueue().
This way from any section of the code system table regions are re/assigned, only highest versioned
RS are considered for target servers.
* As ServerCrashProcedure handles reassignment of regions on a crashed server, don't process
those regions on crashed server through call to AssignmentManager.checkIfShouldMoveSystemRegionAsync()
* Modify LoadBalancer implementation to consider highest versioned Region Servers as favorites
for system table regions.
* Look into ServerManager refactoring to make isServerOnline() and isServerDead() mutually

All these issues are related to AMv2, I will create a JIRAs to track these issues.

Thanks, Umesh

> Fix flaky test hbase.master.procedure.TestServerCrashProcedure#testRecoveryAndDoubleExecutionOnRsWithMeta
> ---------------------------------------------------------------------------------------------------------
>                 Key: HBASE-18366
>                 URL: https://issues.apache.org/jira/browse/HBASE-18366
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Umesh Agashe
>            Assignee: Umesh Agashe
>            Priority: Blocker
>             Fix For: 2.0.0
>         Attachments: hbase-18366.fix1.patch, hbase-18366.fix2.patch
> It worked for a few days after enabling it with HBASE-18278. But started failing after
> 6786b2b
> 68436c9
> 75d2eca
> 50bb045
> df93c13
> It works with one commit before: c5abb6c. Need to see what changed with those commits.
> Currently it fails with TableNotFoundException.

This message was sent by Atlassian JIRA

View raw message