hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chia-Ping Tsai (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-18390) Sleep too long when finding region location failed
Date Tue, 18 Jul 2017 13:38:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-18390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16091554#comment-16091554
] 

Chia-Ping Tsai commented on HBASE-18390:
----------------------------------------

{noformat}
There is an interesting side effect: the client is informed immediately that the regionserver
died, so immediately goes to .meta. As the recovery is not done, .meta. contains the same
(dead) location, so the client fails again and comes back immediately to .meta. => We're
hammering .meta. now. The easy fix is to add a ~10s sleep on the client. A possibly better
fix from a mttr point of view would be to have the master sending messages to say that a server
recovery is finished. I will go for the former first.
{noformat}
What do you think about the comment from HBASE-7590? Does the side effect come back after
this patch is merged? If no, +1 from me.

> Sleep too long when finding region location failed
> --------------------------------------------------
>
>                 Key: HBASE-18390
>                 URL: https://issues.apache.org/jira/browse/HBASE-18390
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.3.1, 1.2.6, 1.1.11, 2.0.0-alpha-1
>            Reporter: Phil Yang
>            Assignee: Phil Yang
>             Fix For: 3.0.0, 1.4.0, 1.3.2, 1.2.7, 2.0.0-alpha-2, 1.1.12
>
>         Attachments: HBASE-18390.v01.patch, HBASE-18390.v02.patch, HBASE-18390.v03.patch
>
>
> If RegionServerCallable#prepare failed when getRegionLocation, the location in this callable
object is null. And before we retry we will sleep. However, when location is null we will
sleep at least 10 seconds. And the request will be failed directly if operation timeout is
less than 10 seconds. I think it is no need to keep MIN_WAIT_DEAD_SERVER logic. Use backoff
sleeping logic is ok for most cases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message