hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "subramanian raghunathan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4462) Properly treating SocketTimeoutException
Date Fri, 23 Sep 2011 10:32:26 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113312#comment-13113312
] 

subramanian raghunathan commented on HBASE-4462:
------------------------------------------------

What i observed in trunk code is HCM.getRegionServerWithRetries()

{code}

   try {
          callable.instantiateServer(tries != 0);
          callable.beforeCall();
          return callable.call();
        } catch (Throwable t) {
          callable.shouldRetry(t);
          t = translateException(t);
          exceptions.add(t);
          if (tries == numRetries - 1) {
            throw new RetriesExhaustedException(callable.getServerName(),
                callable.getRegionName(), callable.getRow(), tries, exceptions);
          }
        } finally {
          callable.afterCall();
        }


  public void shouldRetry(Throwable throwable) throws IOException {
    if (this.callTimeout != HConstants.DEFAULT_HBASE_CLIENT_OPERATION_TIMEOUT)
      if (throwable instanceof SocketTimeoutException
          || (this.endTime - this.startTime > this.callTimeout)) {
        throw (SocketTimeoutException) (SocketTimeoutException) new SocketTimeoutException(
            "Call to access row '" + Bytes.toString(row) + "' on table '"
                + Bytes.toString(tableName)
                + "' failed on socket timeout exception: " + throwable)
            .initCause(throwable);
      } else {
        this.callTimeout = ((int) (this.endTime - this.startTime));
      }
  }

{code}

shouldRetry handles the SocketTimeoutException in a specific manner as such theres no 

retrytimes or period if its SocketTimeoutException  and the exception is immediately thrown
back.

This is handled as a part of the defect {HBASE-2937:Facilitate Timeouts In HBase Client}

But the same is not present in 0.90.x. Does the fix in HBASE-2937 and current JIRA related
?  If so can we backport ?

Please correct me if i am wrong some where.

> Properly treating SocketTimeoutException
> ----------------------------------------
>
>                 Key: HBASE-4462
>                 URL: https://issues.apache.org/jira/browse/HBASE-4462
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>             Fix For: 0.92.0
>
>
> SocketTimeoutException is currently treated like any IOE inside of HCM.getRegionServerWithRetries
and I think this is a problem. This method should only do retries in cases where we are pretty
sure the operation will complete, but with STE we already waited for (by default) 60 seconds
and nothing happened.
> I found this while debugging Douglas Campbell's problem on the mailing list where it
seemed like he was using the same scanner from multiple threads, but actually it was just
the same client doing retries while the first run didn't even finish yet (that's another problem).
You could see the first scanner, then up to two other handlers waiting for it to finish in
order to run (because of the synchronization on RegionScanner).
> So what should we do? We could treat STE as a DoNotRetryException and let the client
deal with it, or we could retry only once.
> There's also the option of having a different behavior for get/put/icv/scan, the issue
with operations that modify a cell is that you don't know if the operation completed or not
(same when a RS dies hard after completing let's say a Put but just before returning to the
client).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message