Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AD664107B8 for ; Tue, 6 Aug 2013 15:38:50 +0000 (UTC) Received: (qmail 21811 invoked by uid 500); 6 Aug 2013 15:38:49 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 21524 invoked by uid 500); 6 Aug 2013 15:38:48 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 21404 invoked by uid 99); 6 Aug 2013 15:38:47 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Aug 2013 15:38:47 +0000 Date: Tue, 6 Aug 2013 15:38:47 +0000 (UTC) From: "Julian Zhou (JIRA)" To: dev@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (HBASE-9139) Independent timeout configuration for rpc channel between cluster nodes MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Julian Zhou created HBASE-9139: ---------------------------------- Summary: Independent timeout configuration for rpc channel between cluster nodes Key: HBASE-9139 URL: https://issues.apache.org/jira/browse/HBASE-9139 Project: HBase Issue Type: Improvement Components: IPC/RPC, regionserver Affects Versions: 0.94.10, 0.96.0 Reporter: Julian Zhou Priority: Minor Fix For: 0.94.11, 0.96.0 Default of "hbase.rpc.timeout" is 60000 ms (1 min). User sometimes increase them to a bigger value such as 600000 ms (10 mins) for many concurrent loading application from client. Some user share the same hbase-site.xml for both client and server. HRegionServer #tryRegionServerReport via rpc channel to report to live master, but there was a window for master failover scenario. That region server attempting to connect to master, which was just killed, backup master took the active role immediately and put to /hbase/master, but region server was still waiting for the rpc timeout from connecting to the dead master. If "hbase.rpc.timeout" is too long, this master failover process will be long due to long rpc timeout from dead master. If so, could we separate with 2 options, "hbase.rpc.timeout" is still for hbase client, while "hbase.rpc.internal.timeout" was for this regionserver/master rpc channel, which could be set shorted value without affect real client rpc timeout value? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira