cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Jirsa <>
Subject Re: Replication lag between data center
Date Thu, 19 May 2016 03:40:41 GMT
Cassandra isn’t a traditional DB – it doesn’t “replicate” in the same way that a
relational DB replicas.

Cassandra clients send mutations (via native protocol or thrift). Those mutations include
a minimum consistency level for the server to return a successful write.

If a write says “Consistency: ALL” - then as soon as the write returns, the mutation exists
on all nodes (no replication delay – it’s done).
If a write is anything other than ALL, it’s possible that any individual node may not have
the write when the client is told the write succeeds. At that point, the coordinator will
make a best effort to deliver the write to all nodes in real time, but may fail or time out.
As far as I know, there are no metrics on this delivery – I believe the writes prior to
the coordinator returning may have some basic data in TRACE, but wouldn’t expect writes
after the coordinator returned to have tracing data available.

If any individual times out completely, the coordinator writes a hint. When the coordinator
sees the node come back online, it will try to replay the writes by replaying the hints –
this may happen minutes or hours later.

If it’s unable to replay hints, or if writes are missed for some other reason, the data
may never “replicate” to the other nodes/Dcs on its own – you may need to manually “replicate”
it using the `nodetool repair` tool.

Taken together, there’s no simple “replication lag” here – if you write with ALL,
the lag is “none”. If you write with CL:QUORUM and read with CL:QUORUM, your effective
lag is “probably none”, because missing replicas will read-repair the data on read. If
you read or write with low consistency, your lag may be milliseconds, hours, weeks, or forever,
depending on how long your link is down and how often you repair. 

From:  cass savy
Reply-To:  ""
Date:  Wednesday, May 18, 2016 at 8:03 PM
To:  ""
Subject:  Replication lag between data center

How can we determine/measure the replication lag or latency between on premise data centers
or cross region/Availability zones?

View raw message