hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8449) Refactor recoverLease retries and pauses informed by findings over in hbase-8389
Date Thu, 02 May 2013 21:08:15 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13647905#comment-13647905

stack commented on HBASE-8449:

Thinking on it, we could test recoverLease result.  If false, wait one second or two and then
retry (for the case where primary node is up, just taking its time).  If it comes back false
on second invocation, then we wait what we think is the hdfs-side read timeout, dfs.socket.timeout,
'public static int READ_TIMEOUT = 60 * 1000;' or some good portion of it and then leave the
loop w/o redoing recoverLease.  The read will likely fail but we have retrying going on around
it (and Jimmy justed improved it over in hbase-8314).

The amount of time to wait the second time should probably be configurable since no way for
us to know the hdfs configs (Talking w/ Elliott, we should have the master ask the NN and
then have it publish the important configs for regionservers to pick up in zk: TODO).  We
can reuse the config added by hbase-8389 and default it to 60 seconds rather than the 4 it
is currently set to.

In another issue, we'd add looking for isFileClosed and if it returns before the 60 seconds
expires, stop waiting and retry recoverLease.
> Refactor recoverLease retries and pauses informed by findings over in hbase-8389
> --------------------------------------------------------------------------------
>                 Key: HBASE-8449
>                 URL: https://issues.apache.org/jira/browse/HBASE-8449
>             Project: HBase
>          Issue Type: Bug
>          Components: Filesystem Integration
>    Affects Versions: 0.94.7, 0.95.0
>            Reporter: stack
>            Priority: Critical
>             Fix For: 0.95.1
> HBASE-8359 is an interesting issue that roams near and far.  This issue is about making
use of the findings handily summarized on the end of hbase-8359 which have it that trunk needs
refactor around how it does its recoverLease handling (and that the patch committed against
HBASE-8359 is not what we want going forward).
> This issue is about making a patch that adds a lag between recoverLease invocations where
the lag is related to dfs timeouts -- the hdfs-side dfs timeout -- and optionally makes use
of the isFileClosed API if it is available (a facility that is not yet committed to a branch
near you and unlikely to be within your locality with a good while to come).

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message