cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ariel Weisberg (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (CASSANDRA-13327) Pending endpoints size check for CAS doesn't play nicely with writes-on-replacement
Date Wed, 05 Apr 2017 21:30:42 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-13327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ariel Weisberg resolved CASSANDRA-13327.
----------------------------------------
    Resolution: Not A Problem

Closing to open a ticket specific to relaxing the limit on number of pending endpoints and
CAS.

> Pending endpoints size check for CAS doesn't play nicely with writes-on-replacement
> -----------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-13327
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13327
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Coordination
>            Reporter: Ariel Weisberg
>            Assignee: Ariel Weisberg
>
> Consider this ring:
> 127.0.0.1  MR UP     JOINING -7301836195843364181
> 127.0.0.2    MR UP     NORMAL -7263405479023135948
> 127.0.0.3    MR UP     NORMAL -7205759403792793599
> 127.0.0.4   MR DOWN     NORMAL -7148113328562451251
> where 127.0.0.1 was bootstrapping for cluster expansion. Note that, due to the failure
of 127.0.0.4, 127.0.0.1 was stuck trying to stream from it and making no progress.
> Then the down node was replaced so we had:
> 127.0.0.1  MR UP     JOINING -7301836195843364181
> 127.0.0.2    MR UP     NORMAL -7263405479023135948
> 127.0.0.3    MR UP     NORMAL -7205759403792793599
> 127.0.0.5   MR UP     JOINING -7148113328562451251
> It’s confusing in the ring - the first JOINING is a genuine bootstrap, the second is
a replacement. We now had CAS unavailables (but no non-CAS unvailables). I think it’s because
the pending endpoints check thinks that 127.0.0.5 is gaining a range when it’s just replacing.
> The workaround is to kill the stuck JOINING node, but Cassandra shouldn’t unnecessarily
fail these requests.
> It also appears like required participants is bumped by 1 during a host replacement so
if the replacing host fails you will get unavailables and timeouts.
> This is related to the check added in CASSANDRA-8346



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message