cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alec Collier <Alec.Coll...@macquarie.com>
Subject RE: Schema questions for data structures with recently-modified access patterns
Date Wed, 22 Jul 2015 23:54:05 GMT
I believe what he really wants is to be able to search for the x most recently modified documents,
i.e. without specifying the docID.

I don’t believe there is a ‘nice’ way of doing this in Cassandra by itself, given it
really favours key-value storage. Even having the date as the partition key is usually not
recommended because it means all writes on a given date will be hitting one node.

Perhaps Solr integration is the way to go for this access pattern?

Alec Collier

From: Jack Krupansky [mailto:jack.krupansky@gmail.com]
Sent: Thursday, 23 July 2015 8:20 AM
To: user@cassandra.apache.org
Subject: Re: Schema questions for data structures with recently-modified access patterns

"No way to query recently-modified documents."

I don't follow why you say that. I mean, that was the point of the data model suggestion I
proposed. Maybe you could clarify.

I also wanted to mention that the new materialized view feature of Cassandra 3.0 might handle
this use case, including taking care of the delete, automatically.


-- Jack Krupansky

On Tue, Jul 21, 2015 at 12:37 PM, Robert Wille <rwille@fold3.com<mailto:rwille@fold3.com>>
wrote:
The time series doesn’t provide the access pattern I’m looking for. No way to query recently-modified
documents.

On Jul 21, 2015, at 9:13 AM, Carlos Alonso <info@mrcalonso.com<mailto:info@mrcalonso.com>>
wrote:


Hi Robert,

What about modelling it as a time serie?

CREATE TABLE document (
  docId UUID,
  doc TEXT,
  last_modified TIMESTAMP
  PRIMARY KEY(docId, last_modified)
) WITH CLUSTERING ORDER BY (last_modified DESC);

This way, you the lastest modification will always be the first record in the row, therefore
accessing it should be as easy as:

SELECT * FROM document WHERE docId == <the docId> LIMIT 1;

And, if you experience diskspace issues due to very long rows, then you can always expire
old ones using TTL or on a batch job. Tombstones will never be a problem in this case as,
due to the specified clustering order, the latest modification will always be first record
in the row.

Hope it helps.

Carlos Alonso | Software Engineer | @calonso<https://twitter.com/calonso>

On 21 July 2015 at 05:59, Robert Wille <rwille@fold3.com<mailto:rwille@fold3.com>>
wrote:
Data structures that have a recently-modified access pattern seem to be a poor fit for Cassandra.
I’m wondering if any of you smart guys can provide suggestions.

For the sake of discussion, lets assume I have the following tables:

CREATE TABLE document (
        docId UUID,
        doc TEXT,
        last_modified TIMEUUID,
        PRIMARY KEY ((docid))
)

CREATE TABLE doc_by_last_modified (
        date TEXT,
        last_modified TIMEUUID,
        docId UUID,
        PRIMARY KEY ((date), last_modified)
)

When I update a document, I retrieve its last_modified time, delete the current record from
doc_by_last_modified, and add a new one. Unfortunately, if you’d like each document to appear
at most once in the doc_by_last_modified table, then this doesn’t work so well.

Documents can get into the doc_by_last_modified table multiple times if there is concurrent
access, or if there is a consistency issue.

Any thoughts out there on how to efficiently provide recently-modified access to a table?
This problem exists for many types of data structures, not just recently-modified. Any ordered
data structure that can be dynamically reordered suffers from the same problems. As I’ve
been doing schema design, this pattern keeps recurring. A nice way to address this problem
has lots of applications.

Thanks in advance for your thoughts

Robert




This email, including any attachments, is confidential. If you are not the intended recipient,
you must not disclose, distribute or use the information in this email in any way. If you
received this email in error, please notify the sender immediately by return email and delete
the message. Unless expressly stated otherwise, the information in this email should not be
regarded as an offer to sell or as a solicitation of an offer to buy any financial product
or service, an official confirmation of any transaction, or as an official statement of the
entity sending this message. Neither Macquarie Group Limited, nor any of its subsidiaries,
guarantee the integrity of any emails or attached files and are not responsible for any changes
made to them by any other person.
Mime
View raw message