Mailing-List: contact hbase-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hbase-dev@hadoop.apache.org
Message-ID: <2096629838.1253054637908.JavaMail.jira@brutus>
Date: Tue, 15 Sep 2009 15:43:57 -0700 (PDT)
From: "stack (JIRA)" <jira@apache.org>
To: hbase-dev@hadoop.apache.org
Subject: [jira] Commented: (HBASE-1815) HBaseClient can get stuck in an
 infinite loop while attempting to contact a failed regionserver
In-Reply-To: <676212266.1252016817456.JavaMail.jira@brutus>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HBASE-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12755760#action_12755760 ] 

stack commented on HBASE-1815:
------------------------------

HBaseClient also has this issue from list:

Yeah, this is down in guts of the hadoop rpc we use.  Around connection setup it has its own config. that is not well aligned with ours (ours being the retries and pause settings)

The maxretriies down in ipc is

this.maxRetries = conf.getInt("ipc.client.connect.max.retries", 10);

Thats for an IOE other than timeout.  For timeout, it does this:

          } catch (SocketTimeoutException toe) {
            /* The max number of retries is 45,
             * which amounts to 20s*45 = 15 minutes retries.
             */
            handleConnectionFailure(timeoutFailures++, 45, toe);

Let me file an issue to address the above.  The retries should be our retries... and in here it has a hardcoded 1000ms that instead should be our pause.... Not hard to fix.

> HBaseClient can get stuck in an infinite loop while attempting to contact a failed regionserver
> -----------------------------------------------------------------------------------------------
>
>                 Key: HBASE-1815
>                 URL: https://issues.apache.org/jira/browse/HBASE-1815
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.20.0
>         Environment: Ubuntu Linux (Linux <elided> 2.6.24-23-generic #1 SMP Wed Apr 1 21:43:24 UTC 2009 x86_64 GNU/Linux), java version "1.6.0_06", Java(TM) SE Runtime Environment (build 1.6.0_06-b02), Java HotSpot(TM) 64-Bit Server VM (build 10.0-b22, mixed mode)
>            Reporter: Justin Lynn
>             Fix For: 0.20.1
>
>         Attachments: thrift_server_log_excerpt, thrift_server_threaddump, thrift_server_threaddump_1
>
>
> While using HBase Thrift server, if a regionserver goes down due to shutdown or failure clients will timeout because the thrift server cannot contact the dead regionserver.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.