hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "nkeywal (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8097) MetaServerShutdownHandler may potentially keep bumping up DeadServer.numProcessing
Date Fri, 15 Mar 2013 08:00:19 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603218#comment-13603218
] 

nkeywal commented on HBASE-8097:
--------------------------------

bq. The timestamp updated in DeadServer.add seems bogus to me.
You're right. It should be a putIfAbsent if it was a concurrentMap. The methods are synchronized
and it should be critical, so the following implementation should be correct:
{code}
  /**
   * Adds the server to the dead server list if it's not there already.
   * @param sn the server name
   */
  public synchronized void add(ServerName sn) {
    this.numProcessing++;
    if (!deadServers.containsKey(sn)){
      deadServers.put(sn, EnvironmentEdgeManager.currentTimeMillis());
    }
  }
{code}

Tell me if you want me to create another JIRA for this or if you don't mind adding this into
this JIRA.

For numProcessing, it seems there are several bugs as well: if you have an exception anywhere
(for example in verifyAndAssignMetaWithRetries();) we have a broken state: we haven't decreased
the numProcessing but we're not working on it anymore. If we want to be exception safe (as
ServerShutdownHandler is) we need the finally imho. I can't find a better solution than:

{code}
  @Override
  public void process() throws IOException {
    boolean gotException = true;
    try {
      try {
        LOG.info("Splitting META logs for " + serverName);
        if (this.shouldSplitHlog) {
          this.services.getMasterFileSystem().splitMetaLog(serverName);
        }
      } catch (IOException ioe) {
        this.deadServers.add(serverName);
        this.services.getExecutorService().submit(this);
        throw new IOException("failed log splitting for " +
            serverName + ", will retry", ioe);
      }


      // Assign root and meta if we were carrying them.
      if (isCarryingMeta()) { // .META.
        // Check again: region may be assigned to other where because of RIT
        // timeout
        if (this.services.getAssignmentManager().isCarryingMeta(serverName)) {
          LOG.info("Server " + serverName
              + " was carrying META. Trying to assign.");
          this.services.getAssignmentManager().regionOffline(
              HRegionInfo.FIRST_META_REGIONINFO);
          verifyAndAssignMetaWithRetries();
        } else {
          LOG.info("META has been assigned to otherwhere, skip assigning.");
        }
      }
      gotException = false;
    } finally {
      if (gotException){
        // If we had an exception we can't rely on super.process to say we finished the process.
        this.deadServers.finish(serverName);
      }
    }
    super.process();
  }
{code}

I can't say I like, but it should do the job...


                
> MetaServerShutdownHandler may potentially keep bumping up DeadServer.numProcessing
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-8097
>                 URL: https://issues.apache.org/jira/browse/HBASE-8097
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Jeffrey Zhong
>            Assignee: Jeffrey Zhong
>             Fix For: 0.96.0
>
>         Attachments: 8097.txt, hbase-8097_1.patch
>
>
> {code}
>     } catch (IOException ioe) {
>       this.services.getExecutorService().submit(this);
>       this.deadServers.add(serverName);
>       throw new IOException("failed log splitting for " +
>           serverName + ", will retry", ioe);
>     }
> {code}
> this.deadServers.add(serverName); will keep incrementing DeadServer.numProcessing
> We can't get rid of numProcessing by just checking deadServers.size() because deadServers
is also used to report some historically failed RSs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message