cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rick Branson (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-5357) Query cache
Date Tue, 26 Nov 2013 19:14:38 GMT


Rick Branson commented on CASSANDRA-5357:

Perhaps an anecdote from a production system might help find a simple, yet useful improvement
to the row cache. Facebook's TAO distributed storage system supports a data model called "assocs"
which are basically just graph edges, and nodes assigned to a given assoc ID hold a write-through
cache of the state. The assoc storage can be roughly considered a more use-case specific CF.
For large assocs with many thousands of edges, TAO only maintains the tail of the assoc in
memory, as those tend to be the most "interesting" portions of data. More of the details are
discussed in the linked paper[1].

Perhaps instead of a total overhaul, what's really needed to evolve the row cache by modifying
it to only cache the head of the row and it's bounds. In contrast to the complexity of trying
to match queries & mutations to a set of serialized query filter objects, the cache only
needs to maintain one interval for each row at most. This would provide a very simple write-through
story. After reviewing our production wide row use cases, they seem to fall into two camps.
The first and most read-performance sensitive is vastly skewed towards reads on the head of
the row (>90% of the time) with a fixed limit. The second is randomly distributed slice
queries which would not seem to provide a very good cache hit rate either way.


> Query cache
> -----------
>                 Key: CASSANDRA-5357
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>            Assignee: Vijay
> I think that most people expect the row cache to act like a query cache, because that's
a reasonable model.  Caching the entire partition is, in retrospect, not really reasonable,
so it's not surprising that it catches people off guard, especially given the confusion we've
inflicted on ourselves as to what a "row" constitutes.
> I propose replacing it with a true query cache.

This message was sent by Atlassian JIRA

View raw message