cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Haddad <...@jonhaddad.com>
Subject Re: CQL 3.x Update ...USING TIMESTAMP...
Date Thu, 12 Mar 2015 18:58:51 GMT
In most datacenters you're going to see significant variance in your server
times.  Likely > 20ms between servers in the same rack.  Even google, using
atomic clocks, has 1-7ms variance.  [1]

I would +1 Tyler's advice here, as using the clocks is only valid if clocks
are perfectly sync'ed, which they are not, and likely never will be in our
lifetime.

[1] http://queue.acm.org/detail.cfm?id=2745385


On Thu, Mar 12, 2015 at 7:04 AM Eric Stevens <mightye@gmail.com> wrote:

> > It's possible, but you'll end up with problems when attempting to
> overwrite or delete entries
>
> I'm wondering if you can elucidate on that a little bit, do you just mean
> that it's easy to forget to always set your timestamp correctly, and if you
> goof it up, it makes it difficult to recover from (i.e. you issue a delete
> with system timestamp instead of document version, and that's way larger
> than your document version would ever be, so you can never write that
> document again)?  Or is there some bug in write timestamps that can cause
> the wrong entry to win the write contention?
>
> We're looking at doing something similar to keep a live max value column
> in a given table, our setup is as follows:
>
> CREATE TABLE a (
>   id <whatever>,
>   time timestamp,
>   max_b_foo int,
>   PRIMARY KEY (id)
> );
> CREATE TABLE b (
>   b_id <whatever>,
>   a_id <whatever>,
>   a_timestamp timestamp,
>   foo int,
>   PRIMARY KEY (a_id, b_id)
> );
>
> The idea being that there's a one-to-many relationship between *a* and *b*.
> We want *a* to know what the maximum value is in *b* for field *foo* so
> we can avoid reading *all* *b* when we want to resolve *a*. You can see
> that we can't just use *b*'s clustering key to resolve that with LIMIT 1;
> also this is for DSE Solr, which wouldn't be able to query a by max b.foo
> anyway.  So when we write to *b*, we also write to *a* with something
> like
>
> UPDATE a USING TIMESTAMP ${b.a_timestamp.toMicros + b.foo} SET max_b_foo =
> ${b.foo} WHERE id = ${b.a_id}
>
> Assuming that we don't run afoul of related antipatterns such as
> repeatedly overwriting the same value indefinitely, this strikes me as
> sound if unorthodox practice, as long as conflict resolution in Cassandra
> isn't broken in some subtle way.  We also designed this to be safe from
> getting write timestamps greatly out of sync with clock time so that
> non-timestamped operations (especially delete) if done accidentally will
> still have a reasonable chance of having the expected results.
>
> So while it may not be the intended use case for write timestamps, and
> there are definitely gotchas if you are not careful or misunderstand the
> consequences, as far as I can see the logic behind it is sound but does
> rely on correct conflict resolution in Cassandra.  I'm curious if I'm
> missing or misunderstanding something important.
>
> On Wed, Mar 11, 2015 at 4:11 PM, Tyler Hobbs <tyler@datastax.com> wrote:
>
>> Don't use the version as your timestamp.  It's possible, but you'll end
>> up with problems when attempting to overwrite or delete entries.
>>
>> Instead, make the version part of the primary key:
>>
>> CREATE TABLE document_store (document_id bigint, version int, document
>> text, PRIMARY KEY (document_id, version)) WITH CLUSTERING ORDER BY (version
>> desc)
>>
>> That way you don't have to worry about overwriting higher versions with a
>> lower one, and to read the latest version, you only have to do:
>>
>> SELECT * FROM document_store WHERE document_id = ? LIMIT 1;
>>
>> Another option is to use lightweight transactions (i.e. UPDATE ... SET
>> docuement = ?, version = ? WHERE document_id = ? IF version < ?), but
>> that's going to make writes much more expensive.
>>
>> On Wed, Mar 11, 2015 at 12:45 AM, Sachin Nikam <sknikam@gmail.com> wrote:
>>
>>> I am planning to use the Update...USING TIMESTAMP... statement to make
>>> sure that I do not overwrite fresh data with stale data while having to
>>> avoid doing at least LOCAL_QUORUM writes.
>>>
>>> Here is my table structure.
>>>
>>> Table=DocumentStore
>>> DocumentID (primaryKey, bigint)
>>> Document(text)
>>> Version(int)
>>>
>>> If the service receives 2 write requests with Version=1 and Version=2,
>>> regardless of the order of arrival, the business requirement is that we end
>>> up with Version=2 in the database.
>>>
>>> Can I use the following CQL Statement?
>>>
>>> Update DocumentStore using <versionValue>
>>> SET  Document=<documentValue>,
>>> Version=<versionValue>
>>> where DocumentID=<documentIDValue>;
>>>
>>> Has anybody used something like this? If so was the behavior as expected?
>>>
>>> Regards
>>> Sachin
>>>
>>
>>
>>
>> --
>> Tyler Hobbs
>> DataStax <http://datastax.com/>
>>
>
>

Mime
View raw message