hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "chunhui shen (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-7670) Synchronized operation in CatalogTracker would block handling ZK Event for long time
Date Fri, 25 Jan 2013 08:13:12 GMT
chunhui shen created HBASE-7670:
-----------------------------------

             Summary: Synchronized operation in CatalogTracker would block handling ZK Event
for long time
                 Key: HBASE-7670
                 URL: https://issues.apache.org/jira/browse/HBASE-7670
             Project: HBase
          Issue Type: Bug
    Affects Versions: 0.94.4
            Reporter: chunhui shen
            Assignee: chunhui shen
            Priority: Critical
             Fix For: 0.96.0
         Attachments: HBASE-7670.patch

We found ZK event not be watched by master for a  long time in our testing.
It seems one ZK-Event-Handle thread block it.
Attaching some logs on master
{code}
2013-01-16 22:18:55,667 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED,

2013-01-16 22:18:56,270 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED,

...
2013-01-16 23:55:33,259 INFO org.apache.hadoop.hbase.catalog.CatalogTracker: Retrying
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=100, exceptions:
        at org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:183)
        at org.apache.hadoop.hbase.client.HTable.get(HTable.java:676)
        at org.apache.hadoop.hbase.catalog.MetaReader.get(MetaReader.java:247)
        at org.apache.hadoop.hbase.catalog.MetaReader.getRegion(MetaReader.java:349)
        at org.apache.hadoop.hbase.catalog.MetaReader.readRegionLocation(MetaReader.java:289)
        at org.apache.hadoop.hbase.catalog.MetaReader.getMetaRegionLocation(MetaReader.java:276)
        at org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:424)
        at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:489)
        at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:451)
        at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:289)
        at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:169)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
2013-01-16 23:55:33,261 WARN org.apache.hadoop.hbase.master.AssignmentManager: Attempted to
handle region transition for server but server is not online
{code}

Between 2013-01-16 22:18:56 and 2013-01-16 23:55:33, there is no any logs about handling ZK
Event.


{code}
this.metaNodeTracker = new MetaNodeTracker(zookeeper, throwableAborter) {
      public void nodeDeleted(String path) {
        if (!path.equals(node)) return;
        ct.resetMetaLocation();
      }
    }
public void resetMetaLocation() {
    LOG.debug("Current cached META location, " + metaLocation +
      ", is not valid, resetting");
    synchronized(this.metaAvailable) {
      this.metaAvailable.set(false);
      this.metaAvailable.notifyAll();
    }
  }

private AdminProtocol getMetaServerConnection(){
synchronized (metaAvailable){
...
ServerName newLocation = MetaReader.getMetaRegionLocation(this);
...
}
}
{code}

>From the above code, we would found that nodeDeleted() would wait synchronized (metaAvailable)
until MetaReader.getMetaRegionLocation(this) done,
however, getMetaRegionLocation() could be retrying for a long time

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message