hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jimmy Xiang (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HBASE-9514) Prevent region from assigning before log splitting is done
Date Sun, 29 Sep 2013 21:26:25 GMT

    [ https://issues.apache.org/jira/browse/HBASE-9514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13781502#comment-13781502
] 

Jimmy Xiang edited comment on HBASE-9514 at 9/29/13 9:25 PM:
-------------------------------------------------------------

Here is the list of changes:
1. fixed a bug in AM#assign(line ~2645), when bulk assign fails, each region should be assigned
again, otherwise, they will be stuck in transition;
2. fixed a bug in AM#unassign(line ~2461), if region is offline, assign it again (moved to
final block, so all scenarios are covered);
3. in RegionStates if the last hosting region server is online, get the server's info to confirm
it has the expected start code (may be too conservative, hasn't seen it in my test yet);
4. in AM, force region state offline, if force new plan, check meta to make sure the last
assignment is not changed (may be too conservative, hasn't seen it in my test yet);
5. enhanced bulk assign a little so that if a region is already assign, no need to force assign.

I have a new patch in testing now (v5.1 attached). The new patch has the following changes:
1. added a CM action to log cluster status every 90 seconds so we know details about regions
in transition;
2. added a hbck check after verification failure so that we know if the cluster is consistent,
i.e., any region is lost/unassigned;
3. added another verify with CM disabled after verification failure so we know if we really
have data loss.

It seems that there is no data loss now since 3. shows ok while the test still fails.


was (Author: jxiang):
Here is the list of changes:
1. fixed a bug in AM#assign(line ~2645), when bulk assign fails, each region should be assigned
again, otherwise, they will be stuck in transition;
2. fixed a bug in AM#unassign(line ~2461), if region is offline, assign it again (moved to
final block, so all scenarios are covered);
3. in RegionStates if the last hosting region server is online, get the server's info to confirm
it has the expected start code (may be too conservative, hasn't seen it in my test yet);
4. in AM, force region state offline, if force new plan, check meta to make sure the last
assignment is not changed (may be too conservative, hasn't seen it in my test yet);
5. enhanced bulk assign a little so that if a region is already assign, no need to force assign.

I have a new patch in testing now. The new patch has the following changes:
1. added a CM action to log cluster status every 90 seconds so we know details about regions
in transition;
2. added a hbck check after verification failure so that we know if the cluster is consistent,
i.e., any region is lost/unassigned;
3. added another verify with CM disabled after verification failure so we know if we really
have data loss.

It seems that there is no data loss now since 3. shows ok while the test still fails.

> Prevent region from assigning before log splitting is done
> ----------------------------------------------------------
>
>                 Key: HBASE-9514
>                 URL: https://issues.apache.org/jira/browse/HBASE-9514
>             Project: HBase
>          Issue Type: Bug
>          Components: Region Assignment
>            Reporter: Jimmy Xiang
>            Assignee: Jimmy Xiang
>            Priority: Blocker
>             Fix For: 0.96.0
>
>         Attachments: trunk-9514_v1.patch, trunk-9514_v2.patch, trunk-9514_v3.patch, trunk-9514_v5.1.patch,
trunk-9514_v5.patch
>
>
> If a region is assigned before log splitting is done by the server shutdown handler,
the edits belonging to this region in the hlogs of the dead server will be lost.
> Generally this is not an issue if users don't assign/unassign a region from hbase shell
or via hbase admin. These commands are marked for experts only in the hbase shell help too.
 However, chaos monkey doesn't care.
> If we can prevent from assigning such regions in a bad time, it would make things a little
safer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message