hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pankaj Kumar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-17704) Regions stuck in FAILED_OPEN when HDFS blocks are missing
Date Tue, 28 Feb 2017 02:22:45 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15887093#comment-15887093
] 

Pankaj Kumar commented on HBASE-17704:
--------------------------------------

[~apurtell], Can we have some chore service which will try to recover those regions who are
in transition for longer duration (say > 10 min)? 

I feel, in some situation this chore service will be useful to reassign the regions which
are stuck in FAILED_OPEN/FAILED_CLOSE state infinitely. 
Like in this JIRA scenario, even after some time DNs came up but HM couldn't reassign them.

> Regions stuck in FAILED_OPEN when HDFS blocks are missing
> ---------------------------------------------------------
>
>                 Key: HBASE-17704
>                 URL: https://issues.apache.org/jira/browse/HBASE-17704
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 1.1.8
>            Reporter: Mathias Herberts
>
> We recently experienced the loss of a whole rack (6 DNs + RS) in a 120 node cluster.
This lead to the regions which were present on the 6 RS which became unavailable to be reassigned
to live RSs. When attempting to open some of the reassigned regions, some RS encountered missing
blocks and issued "No live nodes contain current block Block locations" putting the regions
in state FAILED_OPEN.
> Once the disappeared DNs went back online, the regions were left in FAILED_OPEN, needing
a restart of all the affected RSs to solve the problem.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message