hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars George (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (HBASE-14129) If any regionserver gets shutdown uncleanly during full cluster restart, locality looks to be lost
Date Tue, 14 Mar 2017 13:56:41 GMT

     [ https://issues.apache.org/jira/browse/HBASE-14129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Lars George resolved HBASE-14129.
    Resolution: Won't Fix

Closing as "won't fix" as the hardcoded flag is too intrusive. The cluster should be able
to handle this by fixing the logic in the {{AssignmentManager}}.

> If any regionserver gets shutdown uncleanly during full cluster restart, locality looks
to be lost
> --------------------------------------------------------------------------------------------------
>                 Key: HBASE-14129
>                 URL: https://issues.apache.org/jira/browse/HBASE-14129
>             Project: HBase
>          Issue Type: Bug
>            Reporter: churro morales
>             Fix For: 2.0.0, 1.4.0
>         Attachments: HBASE-14129.patch
> We were doing a cluster restart the other day.  Some regionservers did not shut down
cleanly.  Upon restart our locality went from 99% to 5%.  Upon looking at the AssignmentManager.joinCluster()
code it calls AssignmentManager.processDeadServersAndRegionsInTransition().
> If the failover flag gets set for any reason it seems we don't call assignAllUserRegions().
 Then it looks like the balancer does the work in assigning those regions, we don't use a
locality aware balancer and we lost our region locality.
> I don't have a solid grasp on the reasoning for these checks but there could be some
potential workarounds here.
> 1. After shutting down your cluster, move your WALs aside (replay later).  
> 2. Clean up your zNodes 
> That seems to work, but requires a lot of manual labor.  Another solution which I prefer
would be to have a flag for ./start-hbase.sh --clean 
> If we start master with that flag then we do a check in AssignmentManager.processDeadServersAndRegionsInTransition()
 thus if this flag is set we call: assignAllUserRegions() regardless of the failover state.
> I have a patch for the later solution, that is if I am understanding the logic correctly.

This message was sent by Atlassian JIRA

View raw message