cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benedict Elliott Smith <belliottsm...@datastax.com>
Subject Re: How to maintain the N-most-recent versions of a value?
Date Fri, 18 Jul 2014 10:30:12 GMT
If the versions can be guaranteed to be a adjacent (i.e. if the latest
version is V, the prior version is V-1) you could issue a delete at the
same time as an insert for V-N-(buffer) where buffer >= 0

In general guaranteeing that is probably hard, so this seems like something
that would be nice to have C* manage for you. Unfortunately we don't have
anything on the roadmap to help with this. A custom compaction strategy
might do the trick, or permitting some filter during compaction that can
omit/tombstone certain records based on the input data. This latter option
probably wouldn't be too hard to implement, although it might not offer any
guarantees about expiring records in order without incurring extra
compaction cost (you could reasonably easily guarantee the most recent N
are present, but the cleaning up of older records might happen haphazardly,
in no particular order, and without any promptness guarantees, if you want
to do it cheaply). Feel free to file a ticket, or submit a patch!


On Fri, Jul 18, 2014 at 1:32 AM, Clint Kelly <clint.kelly@gmail.com> wrote:

> Hi everyone,
>
> I am trying to design a schema that will keep the N-most-recent
> versions of a value.  Currently my table looks like the following:
>
> CREATE TABLE foo (
>     rowkey text,
>     family text,
>     qualifier text,
>     version long,
>     value blob,
>     PRIMARY KEY (rowkey, family, qualifier, version))
> WITH CLUSTER ORDER BY (rowkey ASC, family ASC, qualifier ASC, version
> DESC));
>
> Is there any standard design pattern for updating such a layout such
> that I keep the N-most-recent (version, value) pairs for every unique
> (rowkey, family, qualifier)?  I can't think of any way to do this
> without doing a read-modify-write.  The best thing I can think of is
> to use TTL to approximate the desired behavior (which will work if I
> know how often we are writing new data to the table).  I could also
> use "LIMIT N" in my queries to limit myself to only N items, but that
> does not address any of the storage-size issues.
>
> In case anyone is curious, this question is related to some work that
> I am doing translating a system built on HBase (which provides this
> "keep the N-most-recent-version-of-a-cell" behavior) to Cassandra
> while providing the user with as-similar-as-possible an interface.
>
> Best regards,
> Clint
>

Mime
View raw message