incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergey Olefir <>
Subject Range Queries consistency in an inconsistent cluster.
Date Thu, 07 Feb 2013 15:20:57 GMT

I'm somewhat lost in regards to the results I can expect from running range
queries in a (temporarily) 'inconsistent' cluster (e.g. if node has been
down for some time and hasn't caught up yet).

Suppose I have 4 nodes in 2 DCs (cassandra 1.1.7):
DCa: a1 and a2
DCb: b1 and b2
I'm using ByteOrdered partitioner and nodes are balanced (tokens are set
properly to split data evenly in each DC, tokens in DCb are [DCa + 1]).

I'm running with replication DCa:2, DCb:2 (each node contains full data).
I'm using counters only and I'm putting heavy load (say 10k increments per
second). The writes are directed to a1 and a2 only, b1 and b2 are for backup
and possibly for running queries against (haven't decided yet). I monitor
cluster via nodetool and see that data load is even on all nodes (as is

Now a2 goes down. I can immediately see that a1 data load grows very-very
rapidly (because of hints for a2). After half an hour a2 comes back up. I
know from experience that it'll take hours before all hints from a1 will be
sent to a2.

What is going to happen with range queries directed to a1 & a2 while a2
catches up?

As far as I understand, there's no read-repair when doing range queries, so
there's no usual assurance of "wrong once, correct next time around".

- Does consistency level setting apply to range queries?
- If I direct query to a1 (which is up-to-date), will it go to a2 for the
slice that 'belongs' to a2? (even though a1 has full replica of data)
- If I direct query to a2 (which is NOT up-to-date), is it smart enough to
go to a1 for data?
- In general, considering I have a cluster with 3 nodes up-to-date and one
that is not -- is there a way to run a query that'll return up-to-date data
(i.e. will not use data from a2)?

Also, what if a2 has been down for longer than hints window (1 hour by
default)? Is Cassandra smart enough to avoid using a2 for range queries
while it is inconsistent?

Thanks in advance,

View this message in context:
Sent from the mailing list archive at

View raw message