cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paulo Motta (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-13327) Pending endpoints size check for CAS doesn't play nicely with writes-on-replacement
Date Wed, 29 Mar 2017 19:08:41 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-13327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947725#comment-15947725
] 

Paulo Motta commented on CASSANDRA-13327:
-----------------------------------------

bq. To answer your question. What I believe happens is that while streaming is occurring the
replacing node remains in the joining state.

This is designed behavior per CASSANDRA-8523. "JOINING" is just a display name on nodetool
so we must probably fix that to show REPLACING instead, but internally it means the node is
trying to join the ring with the same tokens of the node it's trying to replace (only IFF
the operation completes successfully, that's why it's on pending state).

bq. If the down node is not coming back and you are replacing it why should there be unavailables?

The unavailable only happened because there were 2 pending nodes in the requested range (the
joining node AND the replacing node), and the current CAS design forbids more than 1 pending
endpoint in the requested range (CASSANDRA-8346).

bq. The question for me is whether the replacing node is really pending? What is the definition
of pending and why should it include a replacing node?

The replacing node is pending because we cannot count that node as an ordinary node towards
the consistency level, otherwise if the replace operation fails the operations that used the
replacement node as a member of the quorum will become inconsistent, that's why CASSANDRA-833
added pending/joining nodes as additional members of the cohort.

> Pending endpoints size check for CAS doesn't play nicely with writes-on-replacement
> -----------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-13327
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13327
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Coordination
>            Reporter: Ariel Weisberg
>            Assignee: Ariel Weisberg
>
> Consider this ring:
> 127.0.0.1  MR UP     JOINING -7301836195843364181
> 127.0.0.2    MR UP     NORMAL -7263405479023135948
> 127.0.0.3    MR UP     NORMAL -7205759403792793599
> 127.0.0.4   MR DOWN     NORMAL -7148113328562451251
> where 127.0.0.1 was bootstrapping for cluster expansion. Note that, due to the failure
of 127.0.0.4, 127.0.0.1 was stuck trying to stream from it and making no progress.
> Then the down node was replaced so we had:
> 127.0.0.1  MR UP     JOINING -7301836195843364181
> 127.0.0.2    MR UP     NORMAL -7263405479023135948
> 127.0.0.3    MR UP     NORMAL -7205759403792793599
> 127.0.0.5   MR UP     JOINING -7148113328562451251
> It’s confusing in the ring - the first JOINING is a genuine bootstrap, the second is
a replacement. We now had CAS unavailables (but no non-CAS unvailables). I think it’s because
the pending endpoints check thinks that 127.0.0.5 is gaining a range when it’s just replacing.
> The workaround is to kill the stuck JOINING node, but Cassandra shouldn’t unnecessarily
fail these requests.
> It also appears like required participants is bumped by 1 during a host replacement so
if the replacing host fails you will get unavailables and timeouts.
> This is related to the check added in CASSANDRA-8346



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message