hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jerry He (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-11488) cancelTasks in SubprocedurePool can hang during task error
Date Thu, 10 Jul 2014 01:55:04 GMT
Jerry He created HBASE-11488:

             Summary: cancelTasks in SubprocedurePool can hang during task error
                 Key: HBASE-11488
                 URL: https://issues.apache.org/jira/browse/HBASE-11488
             Project: HBase
          Issue Type: Bug
          Components: snapshots
    Affects Versions: 0.98.3, 0.96.1, 0.99.0
            Reporter: Jerry He
            Assignee: Jerry He
            Priority: Minor

During snapshot on the region server side, if one RegionSnapshotTask throws exception, we
will cancel other tasks.
In RegionServerSnapshotManager.SnapshotSubprocedurePool.waitForOutstandingTasks():
      LOG.debug("Waiting for local region snapshots to finish.");

      int sz = futures.size();
      try {
        // Using the completion service to process the futures that finish first first.
        for (int i = 0; i < sz; i++) {
          Future<Void> f = taskPool.take();
          if (!futures.remove(f)) {
            LOG.warn("unexpected future" + f);
          LOG.debug("Completed " + (i+1) + "/" + sz +  " local region snapshots.");
        LOG.debug("Completed " + sz +  " local region snapshots.");
        return true;
      } catch (InterruptedException e) {
        LOG.warn("Got InterruptedException in SnapshotSubprocedurePool", e);
        if (!stopped) {
          throw new ForeignException("SnapshotSubprocedurePool", e);
        // we are stopped so we can just exit.
      } catch (ExecutionException e) {
        if (e.getCause() instanceof ForeignException) {
          LOG.warn("Rethrowing ForeignException from SnapshotSubprocedurePool", e);
          throw (ForeignException)e.getCause();
        LOG.warn("Got Exception in SnapshotSubprocedurePool", e);
        throw new ForeignException(name, e.getCause());
      } finally {
If  f.get() throws ExecutionException (for example, caused by NotServingRegionException),
we will call cancelTasks().
In cancelTasks():
     // evict remaining tasks and futures from taskPool.
     while (!futures.isEmpty()) {
        // block to remove cancelled futures;
        LOG.warn("Removing cancelled elements from taskPool");

For example, suppose we have 3 tasks, the first one fails and we get an exception when we
          Future<Void> f = taskPool.take();
We didn't remove the 'f' from the 'futures' list yet, but we already take one from taskPool.
As a result, there are 3 in 'futures' list, but only 2 remain in taskPool.
We'll block on taskPool.take() in the above cancelTasks() code.

The end result is that the procedure will always fail timeout exception in the end. 
We could have bailed out earlier with the real cause.

This message was sent by Atlassian JIRA

View raw message