hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matteo Bertozzi (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-13850) Check for dead server on CallTimeoutException
Date Fri, 05 Jun 2015 18:45:01 GMT
Matteo Bertozzi created HBASE-13850:

             Summary: Check for dead server on CallTimeoutException
                 Key: HBASE-13850
                 URL: https://issues.apache.org/jira/browse/HBASE-13850
             Project: HBase
          Issue Type: Improvement
          Components: Client, MTTR
    Affects Versions: 2.0.0, 1.2.0
            Reporter: Matteo Bertozzi
            Assignee: Matteo Bertozzi
            Priority: Minor

WARN this may be a misconf, so let me know if there is a timeout param to set.

zookeeper.session.timeout 10000
hbase.regionserver.storefile.refresh.period 10000
hbase.client.operation.timeout 5000
hbase.client.meta.operation.timeout 5000
hbase.client.scanner.timeout.period 10000
hbase.regionserver.lease.period 10000

I have a test that does a kill STOP on a RS and tries to query it.
>From the conf the zk lease is 10sec, and the master is correctly doing the reassign after
10sec and meta is updated.

the client keep trying to query the RS for a specific row until it get a response. The table.get(row)
in the loop throws a CallTimeoutException every 5sec (which is the configured settings). but
instead of succeed after 2/3 retries (> 10sec where the master reassign) it keeps retrying
up to 60sec (I don't know what that 60sec is, maybe a conf param that I'm not able to find)

one simple fix in the code is handling the CallTimeoutException in RegionServerCallable and
clear the meta cache for that RS that is not responding. (but maybe there is already a conf
to set to reduce that 60sec period)

This message was sent by Atlassian JIRA

View raw message