cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-10423) Paxos/LWT failures when moving node
Date Thu, 01 Oct 2015 12:12:27 GMT


Sylvain Lebresne commented on CASSANDRA-10423:

bq. Why would there be more cas contention just because a node is moving?

When a node is "pending" (moving or bootstrapping), it is considered as a "paxos participant"
(the reason is basically the same than in CASSANDRA-833). And more participants in the paxos
rounds does mean potentially more risk for the algorithm to not make progress due to contention).
That could explain more timeouts while moving/bootstrapping a node, though I don't know if
that explains going from no timeout to 50% of requests timeouting (assuming that's what you
see). We probably need to be able to reproduce this if we want to find out the cause ([~enigmacurry]
do you think you can find someone to look at it), but for that, more information on your case
would greatly help (like your number of nodes, replication settings, whether your usage of
LWT is likely to content a lot or not, ...).

> Paxos/LWT failures when moving node
> -----------------------------------
>                 Key: CASSANDRA-10423
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Cassandra version: 2.0.14
> Java-driver version: 2.0.11
>            Reporter: Roger Schildmeijer
> While moving a node (nodetool move <newtoken>) we noticed that lwt started failing
for some (~50%) requests. The java-driver (version 2.0.11) returned com.datastax.driver.core.exceptions.WriteTimeoutException:
Cassandra timeout during write query at consistency SERIAL (7 replica were required but only
0 acknowledged the write). The cluster was not under heavy load.
> I noticed that the failed lwt requests all took just above 1s. That information and the
WriteTimeoutException could indicate that this happens:
> I can't explain why though. Why would there be more cas contention just because a node
is moving?

This message was sent by Atlassian JIRA

View raw message