hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13618) ReplicationSource is too eager to remove sinks
Date Wed, 06 May 2015 04:39:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529894#comment-14529894

Lars Hofhansl commented on HBASE-13618:

Comments? Concerns?

The issue that I am trying to fix is for a long running region server. If over (say) a month
we successfully replicated 100000's of batches across but just three batches fail due to random
temporary glitches (maybe we rolling restarted the target cluster a few times), we'll still
remove the sink.

> ReplicationSource is too eager to remove sinks
> ----------------------------------------------
>                 Key: HBASE-13618
>                 URL: https://issues.apache.org/jira/browse/HBASE-13618
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>            Priority: Minor
>         Attachments: 13618.txt
> Looking at the replication for some other reason I noticed that the replication source
might be a bit too eager to remove sinks from the list of valid sinks.
> The current logic allows a sink to fail N times (default 3) and then it will be remove
from the sinks. But note that this failure count is never reduced, so given enough runtime
and some network glitches _every_ sink will eventually be removed. When all sink are removed
the source pick new sinks and the counter is set to 0 for all of them.
> I think we should change to reset the counter each time we successfully replicate something
to the sink (which proves the sink isn't dead). Or we could decrease the counter each time
we successfully replication, that might be better - if we consistently fail more attempts
than we succeed the sink should be removed.

This message was sent by Atlassian JIRA

View raw message