hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8449) Refactor recoverLease retries and pauses informed by findings over in hbase-8389
Date Thu, 23 May 2013 21:03:21 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13665670#comment-13665670
] 

stack commented on HBASE-8449:
------------------------------

bq. Increase hbase.lease.recovery.timeout default to 15 minutes, i.e. more than a standard
hdfs recovery.

I do not follow [~nkeywal] It is 15minutes at the moment (this is just a copy of what was
there before).


bq. hbase.lease.recovery.dfs.timeout: it should not be less than 10s imho.

This is set to 61 seconds, what we think the time it will take the NN to timeout on the datanode
(dfs.socket.timeout hopefully).

bq. ....it's as well that it seems that the NN seems not to like multiple calls to the recoverLease.


Yes, the aim w/ this patch is to not kill an ongoing lease recovery.

Regards your proposal, it is built on a patch not yet committed to hdfs.  I am trying to get
something done now so that I can make a 0.95.1 release (what we have currently will do the
scenario you mocked up where you were calling the namenode every second).

bq. The master calls recover lease as a part of the distributed split. We can enhance it in
an other jira to give higher priority to closed wals vs. wals being recovered.

Yeah, that would be a good TODO for later.  All of your proposal seems for later rather than
now.

You +1 on what I have here [~nkeywal]?



                
> Refactor recoverLease retries and pauses informed by findings over in hbase-8389
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-8449
>                 URL: https://issues.apache.org/jira/browse/HBASE-8449
>             Project: HBase
>          Issue Type: Bug
>          Components: Filesystem Integration
>    Affects Versions: 0.94.7, 0.95.0
>            Reporter: stack
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.95.1
>
>         Attachments: 8449.txt, 8449v2.txt, 8449v3.txt, 8449v4.txt
>
>
> HBASE-8359 is an interesting issue that roams near and far.  This issue is about making
use of the findings handily summarized on the end of hbase-8359 which have it that trunk needs
refactor around how it does its recoverLease handling (and that the patch committed against
HBASE-8359 is not what we want going forward).
> This issue is about making a patch that adds a lag between recoverLease invocations where
the lag is related to dfs timeouts -- the hdfs-side dfs timeout -- and optionally makes use
of the isFileClosed API if it is available (a facility that is not yet committed to a branch
near you and unlikely to be within your locality with a good while to come).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message