Thanks Jonathan, that hits exactly the heart of my question. Unfortunately it kills my original idea to implement a "unique transaction identifier creation algorithm" - for this, even eventual consistency would be sufficient, but I would need to know if I am consistent at the time of a read request.

One last question (sorry to bother you): isn't the behavior of read repair strictly deterministic in this case? You say both read requests could try to read repair the result (each time in the opposite direction). Inside the read repair algorithm, when we have exactly the same timestamps, what value is elected for repair? The first one that the node got in the read request? If we make that deterministic, we could avoid this scenario, right?


2010/4/28 Jonathan Ellis <>
2010/4/28 Roland Hänel <>:
> Two clients insert the same key/colum with different values at the same
> time:
>    client A does insert(keyspace, key_1,
> column_name_1, value_A, timestamp_1, consistency_level.QUORUM)
>    client B does insert(keyspace, key_1,
> column_name_1, value_B, timestamp_1, consistency_level.QUORUM)
> After that, both clients read their value:
>    client A does
> get(keyspace, key_1, column_name_1, consistency_level.QUORUM)
>    client B does
> get(keyspace, key_1, column_name_1, consistency_level.QUORUM)
> It is obvious that since the insert happens 'at the same time', i.e. with
> the same timestamp, we cannot say
> which value (value_A or value_B) gets written to the row. However, do we
> have a guarantee that either value_A
> or value_B is written, and that both read operations will return the same
> result?

The guarantee is that "eventually" you will get a consistent result.

Say both writes overlap such that value A is present on replicas R1
and R2, and value B is present on replica R3 (after both writes

Simultaneous read operations could then both attempt to "repair" the
other nodes, and again there could be overlap, resulting in still 2
values present, possibly on different nodes this time.

So: you can see different values on reads when there are two
"simultaneous" writes, and this can continue in the worst-case
scenario until one read's repair can finish before another begins.

Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support