Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Date: Wed, 11 Feb 2015 02:42:12 +0000 (UTC)
From: "zhangduo (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: <JIRA.12772446.1423074070000.9221.1423622532253@Atlassian.JIRA>
In-Reply-To: <JIRA.12772446.1423074070000@Atlassian.JIRA>
References: <JIRA.12772446.1423074070000@Atlassian.JIRA>
 <JIRA.12772446.1423074070192@arcas>
Subject: [jira] [Commented] (HBASE-12971) Replication stuck due to large
 default value for replication.source.maxretriesmultiplier
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HBASE-12971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14315468#comment-14315468 ] 

zhangduo commented on HBASE-12971:
----------------------------------

{quote}
Let's not as another thing. If interval+count dues not work let's change the whole thing.
{quote}
Fine. Can do it in another issue if current interval+count does not work.

> Replication stuck due to large default value for replication.source.maxretriesmultiplier
> ----------------------------------------------------------------------------------------
>
>                 Key: HBASE-12971
>                 URL: https://issues.apache.org/jira/browse/HBASE-12971
>             Project: HBase
>          Issue Type: Bug
>          Components: hbase
>    Affects Versions: 1.0.0, 0.98.10
>            Reporter: Adrian Muraru
>             Fix For: 2.0.0, 1.0.1, 1.1.0, 0.94.27, 0.98.11
>
>
> We are setting in hbase-site the default value of 300 for {{replication.source.maxretriesmultiplier}} introduced in HBASE-11964.
> While this value works fine to recover for transient errors with remote ZK quorum from the peer Hbase cluster - it proved to have side effects in the code introduced in HBASE-11367 Pluggable replication endpoint, where the default is much lower (10).
> See:
> 1. https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java#L169
> 2. https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/HBaseInterClusterReplicationEndpoint.java#L79
> The the two default values are definitely conflicting - when {{replication.source.maxretriesmultiplier}} is set in the hbase-site to 300 this will lead to a  sleep time of 300*300 (25h!) when a sockettimeout exception is thrown.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)