hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-1815) HBaseClient can get stuck in an infinite loop while attempting to contact a failed regionserver
Date Tue, 15 Sep 2009 22:43:57 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12755760#action_12755760

stack commented on HBASE-1815:

HBaseClient also has this issue from list:

Yeah, this is down in guts of the hadoop rpc we use.  Around connection setup it has its own
config. that is not well aligned with ours (ours being the retries and pause settings)

The maxretriies down in ipc is

this.maxRetries = conf.getInt("ipc.client.connect.max.retries", 10);

Thats for an IOE other than timeout.  For timeout, it does this:

          } catch (SocketTimeoutException toe) {
            /* The max number of retries is 45,
             * which amounts to 20s*45 = 15 minutes retries.
            handleConnectionFailure(timeoutFailures++, 45, toe);

Let me file an issue to address the above.  The retries should be our retries... and in here
it has a hardcoded 1000ms that instead should be our pause.... Not hard to fix.

> HBaseClient can get stuck in an infinite loop while attempting to contact a failed regionserver
> -----------------------------------------------------------------------------------------------
>                 Key: HBASE-1815
>                 URL: https://issues.apache.org/jira/browse/HBASE-1815
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.20.0
>         Environment: Ubuntu Linux (Linux <elided> 2.6.24-23-generic #1 SMP Wed
Apr 1 21:43:24 UTC 2009 x86_64 GNU/Linux), java version "1.6.0_06", Java(TM) SE Runtime Environment
(build 1.6.0_06-b02), Java HotSpot(TM) 64-Bit Server VM (build 10.0-b22, mixed mode)
>            Reporter: Justin Lynn
>             Fix For: 0.20.1
>         Attachments: thrift_server_log_excerpt, thrift_server_threaddump, thrift_server_threaddump_1
> While using HBase Thrift server, if a regionserver goes down due to shutdown or failure
clients will timeout because the thrift server cannot contact the dead regionserver.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message