cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vasileios Vlachos <vasileiosvlac...@gmail.com>
Subject CL not satisfied when new node is joining?
Date Tue, 16 Jun 2015 21:35:20 GMT
Hello,

We have a demo cassandra cluster (version 1.2.18) with two DCs and 4 nodes
in total. Our client is using the Datastax C# driver (version 1.2.7).
RF='DC1':2, 'DC2':2. The consistency level is set to LOCAL_QUORUM and all
traffic is coming directly from the application servers in DC1, which then
asynchronously replicates to DC2 (so the LOCAL DC from the application's
perspective is DC1). There are two nodes in each DC and even though that's
a demo cluster, we thought it would be nice to add another node in each DC
to be able to handle failures/maintenance downtime.

We started by adding a new node to DC2 as per instructions here:
http://docs.datastax.com/en/cassandra/1.2/cassandra/operations/ops_add_node_to_cluster_t.html

Almost immediately after the cassandra process was started on this new
node, application logs were thrown which looked like so:

...
System.AggregateException: One or more errors occurred. --->
Cassandra.WriteTimeoutException: Cassandra timeout during write query at
consistency LOCALQUORUM (2 replica(s) acknowledged the write over 3
required)
...

and several other timeouts... During this process we were tailing the
system.log from all 5 cassandra nodes and there were no errors or warning
signs. The application though continued to throw logs similar to the one
above until the node streamed all the data and went from 'UJ' to 'UN'
state, as it appears in the output of nodetool status. After the node was
fully joined to the cluster there have not been similar logs. Not sure if
this is related or not, but we also noticed a schema disagreement in the
cluster while adding the new node:

new_node: 01f0eb0b-82d6-38de-b943-d4f31ca29b98
all other nodes: 2aa39f66-0f1a-3202-8c28-8469ebfdf622

We fixed this by restarting the new node after it had joined the cluster.
All nodes agree that the schema version is
01f0eb0b-82d6-38de-b943-d4f31ca29b98 (not sure why, I would expect the
new_node to agree with the rest).

Initially we thought the issue was related to this:
https://issues.apache.org/jira/browse/CASSANDRA-833

but the more we read about it the more unrelated it feels, plus it appears
to be fixed in the version we are running.

We tried reproducing the issue on a local cluster but we were unable to do
so.

Shouldn't LOCAL_QUORUM require 2 local replicas instead of 3 during the
time the new node was joining the cluster? There are not 3 local replicas
anyway.

Thanks for any help.

Vasilis

Mime
View raw message