cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache
Date Thu, 09 Feb 2012 08:59:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204371#comment-13204371
] 

Sylvain Lebresne commented on CASSANDRA-1956:
---------------------------------------------

bq.  it should also solve the problems which we are discussing in this ticket

What are those?

I'd like us to be a little scientific on that issue. What is it we are trying to do in the
first place? My take on that (and please feel free to correct me if I'm missing something)
is that the kind of caching that I can really see useful in practice are:
# Caching a row entirely; that's what we do and I think we agree we should keep that feature
because sometimes that's what you want.
# Caching the head or the tail of a row for wide rows.
# I could also imagine cases where you want to only pin a few columns (by name) into the cache
without keeping the row entirely.

And well, that's it. I try to think of other type of (not far fetched hypothetical) workload
where caching could be a notable win but are not handled by the 3 cases above and I don't
really find one. Now I apparently am stupid and miss 90% of situations since:

bq. but I see a true query cache as being better than the row cache in 90% of situations

because the 3 cases above are perfectly handled by the idea of just adding a filter per-cf
to our current row cache (which btw could easily be extended to 2-3 filters per-cf if that
proves necessary). So please let's share those cases that are not above and that we want to
handle as part of this ticket.

But if what's above does sum up the problem we want to solve, then I continue to think that
simply adding a per-cf filter alongside our current row cache is the best solution:
* there is *no* memory overhead.
* all 3 caching use case above are handled without any drawback that I can think of.
* it's an incremental change of the existing, not a completely new thing, thus lowering then
risk of introducing new bugs. Typically, I can easily see how CASSANDRA-3862 will translate
to that solution; but I suspect thing may get more complicated for say a query cache.

The only criticism that I've seen so far on that solution is the question of the user configuration
of the cache, while for the query cache there wouldn't be a configuration (which remains to
be proven btw if we want to support the 'stick a row entirely in cache always' case). If someone
consider that auto-configuration should be an absolute priority then let's discuss that, because
I disagree with that (to sum up, I think any auto-configuration of caches will have drawbacks
so I think users should be able to override the default and so I think it's more sane to start
with a cache that user can make do what they want and then evaluate how to make that configuration
mostly automatic, which I think can be done).

So before considering other solutions, I'd like to understand first more clearly why we're
discarding that per-cf filter idea. Because currently it seems to strike a pretty nice balance
of fixing what seems to be the problem versus the added complexity.
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-commiting-block-cache.patch,
0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch,
0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently
have to warn against using the row cache with wide rows, where the read pattern is typically
a peek at the head, but this usecase would be perfect supported by a cache that stored only
columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have
some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary
index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message