incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sylvain Lebresne <>
Subject Re: Detailed behavior of insert() operation?
Date Wed, 28 Apr 2010 16:30:09 GMT
> One last question (sorry to bother you): isn't the behavior of read repair
> strictly deterministic in this case? You say both read requests could try to
> read repair the result (each time in the opposite direction). Inside the
> read repair algorithm, when we have exactly the same timestamps, what value
> is elected for repair? The first one that the node got in the read request?
> If we make that deterministic, we could avoid this scenario, right?

This is deterministic but not centralized. You may have node A with value va
and node B with value vb. Then you read simultaneously on node A and B.
A gets vb from B with same timestamp and decide deterministically to
take the new
value. B does the same in the mean time. The point is, each node does
the same thing. You could break that if the read repair was using some
total ordering on
the nodes to decide what to keep on ties (A and B would decide to keep
the version
of A for instance in my example). But there is no easy way at all to
do such things in
the current implementation.

> -Roland
> 2010/4/28 Jonathan Ellis <>
>> 2010/4/28 Roland Hänel <>:
>> > Two clients insert the same key/colum with different values at the same
>> > time:
>> >
>> >    client A does insert(keyspace, key_1,
>> > column_name_1, value_A, timestamp_1, consistency_level.QUORUM)
>> >    client B does insert(keyspace, key_1,
>> > column_name_1, value_B, timestamp_1, consistency_level.QUORUM)
>> >
>> > After that, both clients read their value:
>> >
>> >    client A does
>> > get(keyspace, key_1, column_name_1, consistency_level.QUORUM)
>> >    client B does
>> > get(keyspace, key_1, column_name_1, consistency_level.QUORUM)
>> >
>> > It is obvious that since the insert happens 'at the same time', i.e.
>> > with
>> > the same timestamp, we cannot say
>> > which value (value_A or value_B) gets written to the row. However, do we
>> > have a guarantee that either value_A
>> > or value_B is written, and that both read operations will return the
>> > same
>> > result?
>> The guarantee is that "eventually" you will get a consistent result.
>> Say both writes overlap such that value A is present on replicas R1
>> and R2, and value B is present on replica R3 (after both writes
>> complete).
>> Simultaneous read operations could then both attempt to "repair" the
>> other nodes, and again there could be overlap, resulting in still 2
>> values present, possibly on different nodes this time.
>> So: you can see different values on reads when there are two
>> "simultaneous" writes, and this can continue in the worst-case
>> scenario until one read's repair can finish before another begins.
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of Riptano, the source for professional Cassandra support

View raw message