cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Carlos Alonso <i...@mrcalonso.com>
Subject Re: Schema questions for data structures with recently-modified access patterns
Date Tue, 21 Jul 2015 15:13:53 GMT
Hi Robert,

What about modelling it as a time serie?

CREATE TABLE document (
  docId UUID,
  doc TEXT,
  last_modified TIMESTAMP
  PRIMARY KEY(docId, last_modified)
) WITH CLUSTERING ORDER BY (last_modified DESC);

This way, you the lastest modification will always be the first record in
the row, therefore accessing it should be as easy as:

SELECT * FROM document WHERE docId == <the docId> LIMIT 1;

And, if you experience diskspace issues due to very long rows, then you can
always expire old ones using TTL or on a batch job. Tombstones will never
be a problem in this case as, due to the specified clustering order, the
latest modification will always be first record in the row.

Hope it helps.

Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso>

On 21 July 2015 at 05:59, Robert Wille <rwille@fold3.com> wrote:

> Data structures that have a recently-modified access pattern seem to be a
> poor fit for Cassandra. I’m wondering if any of you smart guys can provide
> suggestions.
>
> For the sake of discussion, lets assume I have the following tables:
>
> CREATE TABLE document (
>         docId UUID,
>         doc TEXT,
>         last_modified TIMEUUID,
>         PRIMARY KEY ((docid))
> )
>
> CREATE TABLE doc_by_last_modified (
>         date TEXT,
>         last_modified TIMEUUID,
>         docId UUID,
>         PRIMARY KEY ((date), last_modified)
> )
>
> When I update a document, I retrieve its last_modified time, delete the
> current record from doc_by_last_modified, and add a new one. Unfortunately,
> if you’d like each document to appear at most once in the
> doc_by_last_modified table, then this doesn’t work so well.
>
> Documents can get into the doc_by_last_modified table multiple times if
> there is concurrent access, or if there is a consistency issue.
>
> Any thoughts out there on how to efficiently provide recently-modified
> access to a table? This problem exists for many types of data structures,
> not just recently-modified. Any ordered data structure that can be
> dynamically reordered suffers from the same problems. As I’ve been doing
> schema design, this pattern keeps recurring. A nice way to address this
> problem has lots of applications.
>
> Thanks in advance for your thoughts
>
> Robert
>
>

Mime
View raw message