hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-20087) Periodically attempt redeploy of regions in FAILED_OPEN state
Date Wed, 28 Feb 2018 00:12:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-20087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379529#comment-16379529
] 

Andrew Purtell commented on HBASE-20087:
----------------------------------------

Going to want HBASE-20102 for this change

> Periodically attempt redeploy of regions in FAILED_OPEN state
> -------------------------------------------------------------
>
>                 Key: HBASE-20087
>                 URL: https://issues.apache.org/jira/browse/HBASE-20087
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, Region Assignment
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>            Priority: Major
>             Fix For: 2.0.0, 1.5.0
>
>         Attachments: 0001-W-4723090-Port-the-RIT-FAILED_OPEN-state-hack-from-R.patch,
HBASE-20087-branch-1.patch, HBASE-20087-branch-1.patch
>
>
> Because RSGroups can cause permanent RIT with regions in FAILED_OPEN state, we added
logic to the master portion of the RSGroups extention to enumerate RITs and retry assignment
of regions in FAILED_OPEN state.
> However, this strategy can be applied generally to reduce need of operator involvement
in cluster operations. Now an operator has to manually resolve FAILED_OPEN assignments but
there is little risk in automatically retrying them after a while. If the reason the assignment
failed has not cleared, the assignment will just fail again. Should the reason the assignment
failed be resolved, then operators don't have to do more in order for the cluster to fully
heal. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message