cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Morton <>
Subject Re: Question about consitency level & data propagation & eventually consistent
Date Thu, 11 Nov 2010 20:08:02 GMT
Do you have some existing performance issues that you are trying to resolve? It's easier to
improve performance if you know you have X nodes and Y requests.

Each write will be sent to all replicas that are up, then number is determined from your RF
(set RF to all nodes in the cluster to store the data on every node). The CL tells the coordinating
node to wait for X number of replicas to acknowledge the write before returning to the client.
The remaining replicas will continue to complete the write. 

If a replica was down when the write started, it will not have data sent to it. If a replica
failed the write, and the CL was still reached, the HH will be stored somewhere to get the
data back. (I think HH will not be used if the node was down before the write started)

It sounds like you want to write as QUORUM and read at ONE, and possibly reduce the Read Repair
chance to reduce the number of replicas included in each read. I would suggest read and write
a QUORUM to start with, if you have performance issues look at tuning the caches and/or adjusting
the CL for reads. 

Hope that helps.
On 11 Nov, 2010,at 11:57 PM, Thibaut Britz <> wrote:


thanks for all the informative answers. 

Since writing is much faster then reading, I assume that's it's faster to write the data to
3 replicas and read from 1 instead of writing to 2 and reading from at least 2. (especially
if I execute the read operation multiple times on the same key). I could then easily double
my read performance. 

I would then like to do the following: Always write to all nodes which are marked as Up. Then
read from one repair If one node would go down (hardware failure/cassandra down) I would run
the repair tool and fix the node, which shouldn't happen very often. I can also deal with
very small inconsitencies.

- What consitency level would I have to chose? All will fail if one node is down, Quorum will
only write to the quorom. I would need something that will write to all nodes which are marked
as UP. 
- If I choose Quorum, what will happen to the remaining writes if the node is marked as UP.
Will they always be executed or can they be dropped (eg. node doing compactation while the
write happens?)
- To bring a node back to the system, I would run the repair command on the node. Is there
a way to do an offline repair (so I make sure that my application won't read from this node).
I guess chaning the port temporarely will not work, since cassandra will communicate the node
through the other nodes to my client?


On Wed, Nov 10, 2010 at 5:52 PM, Jonathan Ellis <> wrote:
On Wed, Nov 10, 2010 at 8:54 AM, Thibaut Britz
<> wrote:
> Assuming I'm reading and writing with consitency level 1 (one), read repair
> turned off, I have a few questions about data propagation.
> Data is being stored at consistency level 3.
> 1) If all nodes are up:
>  - Will all writes eventually reach all nodes (of the 3 nodes)?


>  - What will be the maximal time until the last write reaches the last node

Situation-dependent.  The important thing is that if you are writing
at CL.ALL, it will be before the write is acked to the client.

> 2) If one or two nodes are down
> - As I understood it, one node will buffer the writes for the remaining
> nodes.

Yes: _after_ the failure detector recognizes them as down. This will
take several seconds.

> - If the nodes go up again: When will these writes be propagated

When FD recognizes them as back up.

> The best way would then be to run nodetool repair after the two nodes will
> be available again. Is there a way to make the node not accept any
> connections during that time until it is finished repairing? (eg throw the
> Unavailableexception)

No.  The way to prevent stale reads is to use an appropriate
consistencylevel, not error-prone heuristics.  (For instance: what if
the replica with the most recent data were itself down when the first
node recovered and initiated repair?)

Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support

  • Unnamed multipart/alternative (inline, None, 0 bytes)
    • Unnamed multipart/related (inline, None, 0 bytes)
View raw message