cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Stevens <migh...@gmail.com>
Subject Re: CQL 3.x Update ...USING TIMESTAMP...
Date Thu, 12 Mar 2015 14:02:30 GMT
> It's possible, but you'll end up with problems when attempting to
overwrite or delete entries

I'm wondering if you can elucidate on that a little bit, do you just mean
that it's easy to forget to always set your timestamp correctly, and if you
goof it up, it makes it difficult to recover from (i.e. you issue a delete
with system timestamp instead of document version, and that's way larger
than your document version would ever be, so you can never write that
document again)?  Or is there some bug in write timestamps that can cause
the wrong entry to win the write contention?

We're looking at doing something similar to keep a live max value column in
a given table, our setup is as follows:

CREATE TABLE a (
  id <whatever>,
  time timestamp,
  max_b_foo int,
  PRIMARY KEY (id)
);
CREATE TABLE b (
  b_id <whatever>,
  a_id <whatever>,
  a_timestamp timestamp,
  foo int,
  PRIMARY KEY (a_id, b_id)
);

The idea being that there's a one-to-many relationship between *a* and *b*.
We want *a* to know what the maximum value is in *b* for field *foo* so we
can avoid reading *all* *b* when we want to resolve *a*. You can see that
we can't just use *b*'s clustering key to resolve that with LIMIT 1; also
this is for DSE Solr, which wouldn't be able to query a by max b.foo
anyway.  So when we write to *b*, we also write to *a* with something like

UPDATE a USING TIMESTAMP ${b.a_timestamp.toMicros + b.foo} SET max_b_foo =
${b.foo} WHERE id = ${b.a_id}

Assuming that we don't run afoul of related antipatterns such as repeatedly
overwriting the same value indefinitely, this strikes me as sound if
unorthodox practice, as long as conflict resolution in Cassandra isn't
broken in some subtle way.  We also designed this to be safe from getting
write timestamps greatly out of sync with clock time so that
non-timestamped operations (especially delete) if done accidentally will
still have a reasonable chance of having the expected results.

So while it may not be the intended use case for write timestamps, and
there are definitely gotchas if you are not careful or misunderstand the
consequences, as far as I can see the logic behind it is sound but does
rely on correct conflict resolution in Cassandra.  I'm curious if I'm
missing or misunderstanding something important.

On Wed, Mar 11, 2015 at 4:11 PM, Tyler Hobbs <tyler@datastax.com> wrote:

> Don't use the version as your timestamp.  It's possible, but you'll end up
> with problems when attempting to overwrite or delete entries.
>
> Instead, make the version part of the primary key:
>
> CREATE TABLE document_store (document_id bigint, version int, document
> text, PRIMARY KEY (document_id, version)) WITH CLUSTERING ORDER BY (version
> desc)
>
> That way you don't have to worry about overwriting higher versions with a
> lower one, and to read the latest version, you only have to do:
>
> SELECT * FROM document_store WHERE document_id = ? LIMIT 1;
>
> Another option is to use lightweight transactions (i.e. UPDATE ... SET
> docuement = ?, version = ? WHERE document_id = ? IF version < ?), but
> that's going to make writes much more expensive.
>
> On Wed, Mar 11, 2015 at 12:45 AM, Sachin Nikam <sknikam@gmail.com> wrote:
>
>> I am planning to use the Update...USING TIMESTAMP... statement to make
>> sure that I do not overwrite fresh data with stale data while having to
>> avoid doing at least LOCAL_QUORUM writes.
>>
>> Here is my table structure.
>>
>> Table=DocumentStore
>> DocumentID (primaryKey, bigint)
>> Document(text)
>> Version(int)
>>
>> If the service receives 2 write requests with Version=1 and Version=2,
>> regardless of the order of arrival, the business requirement is that we end
>> up with Version=2 in the database.
>>
>> Can I use the following CQL Statement?
>>
>> Update DocumentStore using <versionValue>
>> SET  Document=<documentValue>,
>> Version=<versionValue>
>> where DocumentID=<documentIDValue>;
>>
>> Has anybody used something like this? If so was the behavior as expected?
>>
>> Regards
>> Sachin
>>
>
>
>
> --
> Tyler Hobbs
> DataStax <http://datastax.com/>
>

Mime
View raw message