hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Miklos Kurucz (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2445) Clean up client retry policies
Date Tue, 04 May 2010 12:42:56 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12863789#action_12863789

Miklos Kurucz commented on HBASE-2445:

I am having problems with setting hbase.client.retries.number = 1
In that case cache will never be updated.

Jean-Daniel Cryans asked me:
"Yeah I understand that retries are unusable at that level, but you still want retries in
order to be able to recalibrate the .META. cache right?"
My answer is that I want HCM to update cache when it is necessary, but I don't think that
should only happen in retries.
For me it seems that the two things can be separated:
Whenever a NotServingRegionException is caught the cache entry should be cleared.
When an exception is caugh that is not related to bad cache entries the cache should not be

The current exception handling is done this way if I'm correct:

Server sends NSRE
84	          if (ioe instanceof NotServingRegionException) {
85	          // Throw a DNRE so that we break out of cycle of calling NSRE
86	          // when what we need is to open scanner against new location.
87	          // Attach NSRE to signal client that it needs to resetup scanner.
88	          throw new DoNotRetryIOException("Reset scanner", ioe);

1063	        } catch (Throwable t) {
1064	          t = translateException(t);
1431	      if (t instanceof DoNotRetryIOException) {
1432	        throw (DoNotRetryIOException)t;

824	          } catch (DoNotRetryIOException e) {
835	            Throwable cause = e.getCause();
836	            if (cause == null || !(cause instanceof NotServingRegionException)) {
837	              throw e;
838	            }
839	            // Else, its signal from depths of ScannerCallable that we got an
840	            // NSRE on a next and that we need to reset the scanner.

And after resetting the scanner we get to:
776	        callable = getScannerCallable(localStartKey, nbRows);
777	        // Open a scanner on the region server starting at the
778	        // beginning of the region
779	        getConnection().getRegionServerWithRetries(callable);

Which will still use the bad cache entry first? And runs on the same problem?

Perhaps I am mistaken and somewhere we do delete bad cache entry, but in case 
Server sends java.net.ConnectException
1063	        } catch (Throwable t) {
1064	          t = translateException(t);
1065	          exceptions.add(t);
1066	          if (tries == numRetries - 1) {
1067	            throw new RetriesExhaustedException(callable.getServerName(),
1068	                callable.getRegionName(), callable.getRow(), tries, exceptions);

We will not delete bad entry.
I understand that ConnectException can be thrown for various reasons, still deleting the cache
entry might be a good idea even when hbase.client.retries.number = 1
Also retrying a "connection refused" does not make sense to me either.

> Clean up client retry policies
> ------------------------------
>                 Key: HBASE-2445
>                 URL: https://issues.apache.org/jira/browse/HBASE-2445
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: client
>            Reporter: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.20.5, 0.21.0
> Right now almost all retry behavior is governed by a single parameter that determines
the number of retries. In a few places, there are also conf for the number of millis to sleep
between retries. This isn't quite flexible enough. If we can refactor some of the retry logic
into a RetryPolicy class, we could introduce exponential backoff where appropriate, clean
up some of the config, etc

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message