hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Somogyi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13850) Check for dead server on CallTimeoutException
Date Wed, 15 Nov 2017 16:10:03 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16253708#comment-16253708

Peter Somogyi commented on HBASE-13850:

This issue is most probably solved in HBASE-15354.

> Check for dead server on CallTimeoutException
> ---------------------------------------------
>                 Key: HBASE-13850
>                 URL: https://issues.apache.org/jira/browse/HBASE-13850
>             Project: HBase
>          Issue Type: Improvement
>          Components: Client, MTTR
>    Affects Versions: 2.0.0, 1.2.0
>            Reporter: Matteo Bertozzi
>            Assignee: huaxiang sun
>            Priority: Minor
>         Attachments: HBASE-13850-v0.patch, TestGetPerf.java
> WARN this may be a misconf, so let me know if there is a timeout param to set.
> {noformat}
> hbase-site.xml
> zookeeper.session.timeout 10000
> hbase.regionserver.storefile.refresh.period 10000
> hbase.client.operation.timeout 5000
> hbase.client.meta.operation.timeout 5000
> hbase.client.scanner.timeout.period 10000
> hbase.regionserver.lease.period 10000
> {noformat}
> I have a test that does a kill STOP on a RS and tries to query it.
> From the conf the zk lease is 10sec, and the master is correctly doing the reassign after
10sec and meta is updated.
> the client keep trying to query the RS for a specific row until it get a response. The
table.get(row) in the loop throws a CallTimeoutException every 5sec (which is the configured
settings). but instead of succeed after 2/3 retries (> 10sec where the master reassign)
it keeps retrying up to 60sec (I don't know what that 60sec is, maybe a conf param that I'm
not able to find)
> one simple fix in the code is handling the CallTimeoutException in RegionServerCallable
and clear the meta cache for that RS that is not responding. (but maybe there is already a
conf to set to reduce that 60sec period)

This message was sent by Atlassian JIRA

View raw message