hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Daniel Cryans (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-6348) Region assignments should be only allowed edit META hosted on the same cluster.
Date Mon, 09 Jul 2012 19:11:34 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409743#comment-13409743
] 

Jean-Daniel Cryans commented on HBASE-6348:
-------------------------------------------

Wouldn't it be just easier to make sure that .META. is assigned correctly? IIUC this is where
the problem happened (HMaster.assignRootAndMeta):

{code}
    if (!this.catalogTracker.verifyMetaRegionLocation(timeout)) {
      ServerName currentMetaServer =
        this.catalogTracker.getMetaLocationOrReadLocationFromRoot();
      if (currentMetaServer != null
          && !currentMetaServer.equals(currentRootServer)) {
        splitLogAndExpireIfOnline(currentMetaServer);
      }
      assignmentManager.assignMeta();
      this.catalogTracker.waitForMeta();
      // Above check waits for general meta availability but this does not
      // guarantee that the transition has completed
      this.assignmentManager.waitForAssignment(HRegionInfo.FIRST_META_REGIONINFO);
      assigned++;
    } else {
      // Region already assigned.  We didnt' assign it.  Add to in-memory state.
      this.assignmentManager.regionOnline(HRegionInfo.FIRST_META_REGIONINFO,
        this.catalogTracker.getMetaLocation());
    }
{code}

When the location was verified, it was able to read the old .META. location from ROOT and
since the region was still there it was assumed that .META. was correctly assigned. Now what's
interesting is this from AM.regionOnline:

{code}
      if (isServerOnline(sn)) {
        this.regions.put(regionInfo, sn);
        addToServers(sn, regionInfo);
        this.regions.notifyAll();
      } else {
        LOG.info("The server is not in online servers, ServerName=" + 
          sn.getServerName() + ", region=" + regionInfo.getEncodedName());
      }
{code}

I assume that if you went over the master's log you would find the log message about the server
not being online? It seems to me that we should either check if the server belongs to us or
backtrack when we fail to setting the region online.
                
> Region assignments should be only allowed edit META hosted on the same cluster.
> -------------------------------------------------------------------------------
>
>                 Key: HBASE-6348
>                 URL: https://issues.apache.org/jira/browse/HBASE-6348
>             Project: HBase
>          Issue Type: Task
>            Reporter: Jonathan Hsieh
>
> We copied hbase file data (root/meta/tables) from one hdfs cluster to another, scrubbed
it, and then attempted to start the new cluster.  We noticed that META on the original cluster
was being modified with server entries from the new cluster.  
> Its contrived but here is how it happened.
> First we copied all the data.  Then we "scrubbed" META -- we removed all region serverinfo
cols that pointed to nodes on the original cluster.  When we started the new cluster, it picked
a RS to serve ROOT.  Since we had scrubbed meta, then the new cluster's master attempted to
assign regions to other region servers on the new cluster.  From the code's point of view
this all succeeeded -- zk went through transitions, according to the master they were assigned.
 However, we started seeing NotServingRegionExceptions on the original cluster.
> The root cause is that ROOT was not scrubbed.  The new cluster assigned the copy of ROOT
to a new cluster RS.  Now, when the new cluster attempted to modify META, it would read the
old ROOT's serverinfo pointer go to the *old cluster's regionserver*.  The old cluster's regionserer
just so happened to be still serving META, so the old cluster's META server gladly accepted
the assignments that included the new cluster's regionserver names.
> At this point we brought down the new cluster (it was getting killed).  Clients on the
old cluster would now go to zk,root,meta, and get pointers to the new cluster.  NSRE's happened.
 Unhappyness.
> Long story short, we should have some mechanism to make sure that region assignments
should be only allowed edit META hosted on the same cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message