Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Date: Sun, 10 May 2015 00:53:59 +0000 (UTC)
From: "Lars Hofhansl (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: <JIRA.12827135.1430798609000.67278.1431219239977@Atlassian.JIRA>
In-Reply-To: <JIRA.12827135.1430798609000@Atlassian.JIRA>
References: <JIRA.12827135.1430798609000@Atlassian.JIRA>
 <JIRA.12827135.1430798609380@arcas>
Subject: [jira] [Updated] (HBASE-13618) ReplicationSource is too eager to
 remove sinks
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


     [ https://issues.apache.org/jira/browse/HBASE-13618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-13618:
----------------------------------
    Attachment: 13618-v2.txt

I like -v2 better since it simplifies the success code a bit.
Instead of decreasing the fail counter, a success resets the counter.

[~apurtell], can you have a quick look again?

> ReplicationSource is too eager to remove sinks
> ----------------------------------------------
>
>                 Key: HBASE-13618
>                 URL: https://issues.apache.org/jira/browse/HBASE-13618
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>            Priority: Minor
>         Attachments: 13618-v2.txt, 13618.txt
>
>
> Looking at the replication for some other reason I noticed that the replication source might be a bit too eager to remove sinks from the list of valid sinks.
> The current logic allows a sink to fail N times (default 3) and then it will be remove from the sinks. But note that this failure count is never reduced, so given enough runtime and some network glitches _every_ sink will eventually be removed. When all sink are removed the source pick new sinks and the counter is set to 0 for all of them.
> I think we should change to reset the counter each time we successfully replicate something to the sink (which proves the sink isn't dead). Or we could decrease the counter each time we successfully replication, that might be better - if we consistently fail more attempts than we succeed the sink should be removed.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)