incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: Why do Digest Queries return hash instead of timestamp?
Date Wed, 13 Jul 2011 15:05:28 GMT
(1) the hash calculation is a small amount of CPU -- MD5 is
specifically designed to be efficient in this kind of situation
(2) we compute one hash per query, so for multiple columns the
advantage over timestamp-per-column gets large quickly.

On Wed, Jul 13, 2011 at 7:31 AM, David Boxenhorn <david@citypath.com> wrote:
> Is that the actual reason?
>
> This seems like a big inefficiency to me. For those of us who don't worry
> about this extreme edge case (that probably will NEVER happen in real life,
> for most applications), is there a way to turn this off?
>
> Or am I wrong about this making the operation MUCH more expensive?
>
>
> On Wed, Jul 13, 2011 at 3:20 PM, Boris Yen <yulinyen@gmail.com> wrote:
>>
>> For a specific column, If there are two versions with the same timestamp,
>> the value of the column is used to break the tie.
>> if v1.value().compareTo(v2.value()) < 0, it means that v2 wins.
>> On Wed, Jul 13, 2011 at 7:13 PM, David Boxenhorn <david@citypath.com>
>> wrote:
>>>
>>> How would you know which data is correct, if they both have the same
>>> timestamp?
>>>
>>> On Wed, Jul 13, 2011 at 12:40 PM, Boris Yen <yulinyen@gmail.com> wrote:
>>>>
>>>> I can only say, "data" does matter, that is why the developers use hash
>>>> instead of timestamp. If hash value comes from other node is not a match,
a
>>>> read repair would perform. so that correct data can be returned.
>>>>
>>>> On Wed, Jul 13, 2011 at 5:08 PM, David Boxenhorn <david@citypath.com>
>>>> wrote:
>>>>>
>>>>> If you have to pieces of data that are different but have the same
>>>>> timestamp, how can you resolve consistency?
>>>>>
>>>>> This is a pathological situation to begin with, why should you waste
>>>>> effort to (not) solve it?
>>>>>
>>>>> On Wed, Jul 13, 2011 at 12:05 PM, Boris Yen <yulinyen@gmail.com>
wrote:
>>>>>>
>>>>>> I guess it is because the timestamp does not guarantee data
>>>>>> consistency, but hash does.
>>>>>> Boris
>>>>>>
>>>>>> On Wed, Jul 13, 2011 at 4:27 PM, David Boxenhorn <david@citypath.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> I just saw this
>>>>>>>
>>>>>>> http://wiki.apache.org/cassandra/DigestQueries
>>>>>>>
>>>>>>> and I was wondering why it returns a hash of the data. Wouldn't
it be
>>>>>>> better and easier to return the timestamp? You don't really care
what the
>>>>>>> data is, you only care whether it is more or less recent than
another piece
>>>>>>> of data.
>>>>>>
>>>>>
>>>>
>>>
>>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Mime
View raw message