hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted
Date Fri, 13 Jan 2012 21:14:39 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185858#comment-13185858
] 

stack commented on HBASE-5155:
------------------------------

bq. So if we have a rolling restart scenario then this will be a problem right ? Previously
the table node will not be present for the Enabled state but now we will create it.

Have you tried it?  In rolling restart we'll upgrade the master first usually.  Won't it know
how to deal w/ new zk node for ENABLED state?

FYI, don't do these kinda changes in future:

{code}
-      for (HRegionInfo region: regions) {
+      for (HRegionInfo region : regions) {
{code}

What was there previous was fine... It adds bulk to your patch.

This looks like a method used internally by AM only.  Does it need to be public?

{code}
+  public void setEnabledTable(String tableName) {
{code}

In processDeadRegion, should we check parent exists before doing daughter fixups? (It could
have been deleted?)

I don't undersand this comment:

{code}
+    // Enable the ROOT table if on process fail over the RS containing ROOT
+    // was active.
{code}

Same for the one on .meta.

Why we have to enable the meta and root tables?  Aren't they always on?

Is this right:

{code}
+   * Check if the table is in DISABLED state in cache
{code}

Is it just checking cache?  This class gets updated when the zk changes right?  So its not
just a 'cache'?  I think should drop 'from cache' in your public javadoc.

Same for isDisabling, etc

Is this right below:

{code}
   public boolean isEnabledTable(String tableName) {
-    synchronized (this.cache) {
-      // No entry in cache means enabled table.
-      return !this.cache.containsKey(tableName);
-    }
+    return isTableState(tableName, TableState.ENABLED);
{code}

Else patch looks good to me.  Was afraid it too much for 0.90.6 but its looking ok.



                
> ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation
of regions that were deleted
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5155
>                 URL: https://issues.apache.org/jira/browse/HBASE-5155
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.4
>            Reporter: ramkrishna.s.vasudevan
>            Priority: Blocker
>             Fix For: 0.90.6
>
>         Attachments: HBASE-5155_1.patch, HBASE-5155_latest.patch, hbase-5155_6.patch
>
>
> ServerShutDownHandler and disable/delete table handler races.  This is not an issue due
to TM.
> -> A regionserver goes down.  In our cluster the regionserver holds lot of regions.
> -> A region R1 has two daughters D1 and D2.
> -> The ServerShutdownHandler gets called and scans the META and gets all the user
regions
> -> Parallely a table is disabled. (No problem in this step).
> -> Delete table is done.
> -> The tables and its regions are deleted including R1, D1 and D2.. (So META is cleaned)
> -> Now ServerShutdownhandler starts to processTheDeadRegion
> {code}
>  if (hri.isOffline() && hri.isSplit()) {
>       LOG.debug("Offlined and split region " + hri.getRegionNameAsString() +
>         "; checking daughter presence");
>       fixupDaughters(result, assignmentManager, catalogTracker);
> {code}
> As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 
> {code}
>     if (isDaughterMissing(catalogTracker, daughter)) {
>       LOG.info("Fixup; missing daughter " + daughter.getRegionNameAsString());
>       MetaEditor.addDaughter(catalogTracker, daughter, null);
>       // TODO: Log WARN if the regiondir does not exist in the fs.  If its not
>       // there then something wonky about the split -- things will keep going
>       // but could be missing references to parent region.
>       // And assign it.
>       assignmentManager.assign(daughter, true);
> {code}
> we call assign of the daughers.  
> Now after this we again start with the below code.
> {code}
>         if (processDeadRegion(e.getKey(), e.getValue(),
>             this.services.getAssignmentManager(),
>             this.server.getCatalogTracker())) {
>           this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Now when the SSH scanned the META it had R1, D1 and D2.
> So as part of the above code D1 and D2 which where assigned by fixUpDaughters
> is again assigned by 
> {code}
> this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Thus leading to a zookeeper issue due to bad version and killing the master.
> The important part here is the regions that were deleted are recreated which i think
is more critical.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message