hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-12971) Replication stuck due to large default value for replication.source.maxretriesmultiplier
Date Sat, 14 Feb 2015 07:46:11 GMT

    [ https://issues.apache.org/jira/browse/HBASE-12971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14321286#comment-14321286
] 

Hudson commented on HBASE-12971:
--------------------------------

FAILURE: Integrated in HBase-1.0 #740 (See [https://builds.apache.org/job/HBase-1.0/740/])
HBASE-12971 Replication stuck due to large default value for replication.source.maxretriesmultiplier.
(larsh: rev 255dc4e58a2c1935cb07ac8e89148a0eeee52445)
* hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/HBaseInterClusterReplicationEndpoint.java


> Replication stuck due to large default value for replication.source.maxretriesmultiplier
> ----------------------------------------------------------------------------------------
>
>                 Key: HBASE-12971
>                 URL: https://issues.apache.org/jira/browse/HBASE-12971
>             Project: HBase
>          Issue Type: Bug
>          Components: hbase
>    Affects Versions: 1.0.0, 0.98.10
>            Reporter: Adrian Muraru
>            Assignee: Lars Hofhansl
>             Fix For: 2.0.0, 1.0.1, 1.1.0
>
>         Attachments: 12971-v2.txt, 12971.txt
>
>
> We are setting in hbase-site the default value of 300 for {{replication.source.maxretriesmultiplier}}
introduced in HBASE-11964.
> While this value works fine to recover for transient errors with remote ZK quorum from
the peer Hbase cluster - it proved to have side effects in the code introduced in HBASE-11367
Pluggable replication endpoint, where the default is much lower (10).
> See:
> 1. https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java#L169
> 2. https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/HBaseInterClusterReplicationEndpoint.java#L79
> The the two default values are definitely conflicting - when {{replication.source.maxretriesmultiplier}}
is set in the hbase-site to 300 this will lead to a  sleep time of 300*300 (25h!) when a sockettimeout
exception is thrown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message