cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-5357) Query cache
Date Tue, 07 Jan 2014 18:54:19 GMT


Jonathan Ellis commented on CASSANDRA-5357:

bq. I think we could also do some intelligent sizing of the cache per-CF with the metrics
we keep, that would be relatively static (so impervious to churn).

I'm not sure what I was thinking here.  (Maybe that we'd only need one cached partition per
CF which is nonsense.)  We do need LRU or similar behavior at a high level, just like we do
with the row cache today.

The question is, how much of each partition do we cache?  I think it's a lot simpler if we
decide we'll cache the same amount for each partition in a CF, and not try to be clever and
"extend" a cached partition when we query for more later.

So how much do we cache?  We can either

# Make the user configure it, which requires creating new CQL syntax, or
# Determine it automatically

Personally I'd lean towards (2):
# Track an EstimatedHistogram of LIMITs in qualifying queries
# Set the cells-to-cache per CF so that we maximize the queries we can satisfy for a given
cache size
# I think this also means we should go back to a separate cache per CF with its own size limit
-- if we have 1000 queries/s against CF X's cache, then we shouldn't throw those away when
a query against CF Y comes in where we expect only 10/s

In the interest of shipping sooner than later though I'll take whatever we can reasonably
do for 2.1.0 and push the rest out to improve later.  If we just have a single "cache this
many cells" parameter in cassandra.yaml that's still better than people OOMing themselves
with the classic row cache.

> Query cache
> -----------
>                 Key: CASSANDRA-5357
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>            Assignee: Marcus Eriksson
>             Fix For: 2.1
> I think that most people expect the row cache to act like a query cache, because that's
a reasonable model.  Caching the entire partition is, in retrospect, not really reasonable,
so it's not surprising that it catches people off guard, especially given the confusion we've
inflicted on ourselves as to what a "row" constitutes.
> I propose replacing it with a true query cache.

This message was sent by Atlassian JIRA

View raw message