hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ramkrishna.s.vasudevan (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted
Date Fri, 13 Jan 2012 18:56:39 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185748#comment-13185748

ramkrishna.s.vasudevan commented on HBASE-5155:

Yes Ted. Some of that part was refactored by some defect by Ming.

Also in trunk as HBASE-4083 is there we have disablingTables and also enablingTables.  So
for trunk may be we may have to apply the changes considering the code there.
> ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation
of regions that were deleted
> -----------------------------------------------------------------------------------------------------------------------
>                 Key: HBASE-5155
>                 URL: https://issues.apache.org/jira/browse/HBASE-5155
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.4
>            Reporter: ramkrishna.s.vasudevan
>            Priority: Blocker
>             Fix For: 0.90.6
>         Attachments: HBASE-5155_latest.patch, hbase-5155_6.patch
> ServerShutDownHandler and disable/delete table handler races.  This is not an issue due
to TM.
> -> A regionserver goes down.  In our cluster the regionserver holds lot of regions.
> -> A region R1 has two daughters D1 and D2.
> -> The ServerShutdownHandler gets called and scans the META and gets all the user
> -> Parallely a table is disabled. (No problem in this step).
> -> Delete table is done.
> -> The tables and its regions are deleted including R1, D1 and D2.. (So META is cleaned)
> -> Now ServerShutdownhandler starts to processTheDeadRegion
> {code}
>  if (hri.isOffline() && hri.isSplit()) {
>       LOG.debug("Offlined and split region " + hri.getRegionNameAsString() +
>         "; checking daughter presence");
>       fixupDaughters(result, assignmentManager, catalogTracker);
> {code}
> As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 
> {code}
>     if (isDaughterMissing(catalogTracker, daughter)) {
>       LOG.info("Fixup; missing daughter " + daughter.getRegionNameAsString());
>       MetaEditor.addDaughter(catalogTracker, daughter, null);
>       // TODO: Log WARN if the regiondir does not exist in the fs.  If its not
>       // there then something wonky about the split -- things will keep going
>       // but could be missing references to parent region.
>       // And assign it.
>       assignmentManager.assign(daughter, true);
> {code}
> we call assign of the daughers.  
> Now after this we again start with the below code.
> {code}
>         if (processDeadRegion(e.getKey(), e.getValue(),
>             this.services.getAssignmentManager(),
>             this.server.getCatalogTracker())) {
>           this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Now when the SSH scanned the META it had R1, D1 and D2.
> So as part of the above code D1 and D2 which where assigned by fixUpDaughters
> is again assigned by 
> {code}
> this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Thus leading to a zookeeper issue due to bad version and killing the master.
> The important part here is the regions that were deleted are recreated which i think
is more critical.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message