hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HBASE-2486) Add simple "anti-entropy" for region assignment
Date Wed, 01 Sep 2010 17:47:54 GMT

     [ https://issues.apache.org/jira/browse/HBASE-2486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

stack updated HBASE-2486:

    Fix Version/s: 0.92.0
                       (was: 0.90.0)

We need this but I think the urgency has had some of the air let out of it.  Punting to 0.92
for now.  Pull back in if the new bugs that master rewrite has introduced require this described

> Add simple "anti-entropy" for region assignment
> -----------------------------------------------
>                 Key: HBASE-2486
>                 URL: https://issues.apache.org/jira/browse/HBASE-2486
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, regionserver
>    Affects Versions: 0.20.5
>            Reporter: Todd Lipcon
>            Assignee: Eugene Koontz
>             Fix For: 0.92.0
>         Attachments: hbase2486.diff, hbase2486.diff
> We've seen a number of bugs where a region server thinks it should not be serving a region,
but the master and META think it should be. I'd like to propose a very simple way of fixing
this issue:
> 1) whenever a regionserver throws a NotServingRegionException, it also marks that region
id in an RS-wide Set
> 2) when a region sends a heartbeat, include a message for each of these regions, MSG_REPORT_NSRE
or somesuch, and then clear the set
> 3) when the master receives MSG_REPORT_NSRE, it does the following checks:
> a) if the region is assigned elsewhere according to META, the NSRE was due to a stale
client, ignore
> b) if the region is in transition, ignore
> c) otherwise, we have an inconsistency, and we should take some steps to resolve (eg
mark the region unassigned, or exit the master if we are in "paranoid mode")
> Whatever we do, we need to make sure that this is loudly logged, and causes unit tests
to fail, when it's detected. This should *not* happen, but when it does, it would be good
to recover without addtable.rb, etc.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message