hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ping (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-9740) A corrupt HFile could cause endless attempts to assign the region without a chance of success
Date Thu, 23 Jan 2014 03:10:37 GMT

    [ https://issues.apache.org/jira/browse/HBASE-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13879403#comment-13879403

Ping commented on HBASE-9740:

    Yes, @[~jxiang], you're right for we can't notice it from master's status page. on the
other way the application which access this table(region) will find that. I am thinking of
put some warning message on the page too if this is fit.
    I followed this: We just consider 0.94 branch, for this branch, if we move the region
to  FAILED_OPEN state, the AM will assign it again and again which BLOCKs cluster balancing,
and in our product cluster, we can't even disable the table to make a repair(all tools include
close_region/hbck repair/disable table/... NOT usable). so we think we can make this region
offline for advanced repair or maintain.
    @[~adityakishore], thanks for your help, I will check my code style and fix it. and resubmit
the patch if you and jimmy agree with it.

> A corrupt HFile could cause endless attempts to assign the region without a chance of
> ---------------------------------------------------------------------------------------------
>                 Key: HBASE-9740
>                 URL: https://issues.apache.org/jira/browse/HBASE-9740
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.16
>            Reporter: Aditya Kishore
>            Assignee: Aditya Kishore
>         Attachments: patch-9740_0.94.txt
> As described in HBASE-9737, a corrupt HFile in a region could lead to an assignment storm
in the cluster since the Master will keep trying to assign the region to each region server
one after another and obviously none will succeed.
> The region server, upon detecting such a scenario should mark the region as "RS_ZK_REGION_FAILED_ERROR"
(or something to the effect) in the Zookeeper which should indicate the Master to stop assigning
the region until the error has been resolved (via an HBase shell command, probably "assign"?)

This message was sent by Atlassian JIRA

View raw message