cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache
Date Fri, 10 Feb 2012 16:37:07 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205528#comment-13205528
] 

Sylvain Lebresne commented on CASSANDRA-1956:
---------------------------------------------

bq. The filter approach allows us to make slice-based queries more efficient (somewhat clumsily)

What is so clumsy?

bq. but doesn't really address the inefficiency for name-based queries

Depends on what we're talking. The filter approach would allow to set a name-based filter.
But ok, that is less convenient. But the query cache is not perfect either. If you do different
name-based query, we will end up caching the same data multiple times. We may be able to optimize
this, but then it becomes fairly complicated.

bq. while with a true query cache we could do write-through updates on 2I queries as well

I'm not sure I understand, could you clarify your idea?

Don't get me wrong, I'm not totally closed to the idea of query cache or something alike,
but I do want to make sure we don't jump on it without a good reasoning behind, because I
do fear a query cache will come with a bunch of complication (and while you may have good
reasoning, I personally don't yet see clearly that it's the best choice, so I'll need some
convincing). The query cache also has the risk of caching multiple time the same thing. Take
a CF on which you do some paging: provided the row receives a few update, we'll end up re-caching
the same things multiple times (unless we're really smart about it but I'm pretty sure it's
not a simple problem). I'm not sure how much of a problem that'll be in practice but ...

Then there is also the fact that the way you model in C* is usually with one CF per kind of
query. So it does feel like keeping each query separately shouldn't be necessary. But that's
not a technical argument.
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-commiting-block-cache.patch,
0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch,
0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently
have to warn against using the row cache with wide rows, where the read pattern is typically
a peek at the head, but this usecase would be perfect supported by a cache that stored only
columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have
some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary
index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message