cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vijay (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-5357) Query cache
Date Thu, 08 Aug 2013 04:54:53 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13733151#comment-13733151
] 

Vijay commented on CASSANDRA-5357:
----------------------------------

Hi Jonathan, The idea in the current implementation is as follows:

The QueryCache<QueryFilter,CF> is implemented on top of SerializedCache. It stores the
Map's key as a RowCacheKey<RowKey, CFID> (same as earlier RowCache), and Map's value
is a composite value as QueryCacheValue<[Query, ....], ColumnFamily>, 

For every new query enters the system, we get the QueryCacheValue after generating RowCacheKey
from QueryFilter, to check if the IFilter exist. If it does then return CF; else get QueryCacheValue
(if QCV exist; else create new), add the IFilter to QCV and merge the results with the existing
ColumnFamily (also in QCV), which will in-turn be serialized.

Advantages: 
1) Queries can overlap, there could be any number of queries but the data will not be repeated
within them.
2) When we want to invalidate it we would just invalidate the RowKey and all the cached QueryCacheValue
goes away (avoids another Map for book keeping and hence little more memory efficient)
3) there is a property which user can enable to cache the whole row no matter what the query
is (but currently patch adds overhead of deserializing identity filter which can be fixed
though).

Of course there are disadvantages: 
1) LRU algorithm is no longer really accurate, When a single query is hot we have no way of
invalidating the other queries on the same row, since they all have the same number of hit
rates (which is no worse than what we have currently)
2) With multiple types of queries on the same row (which is kind of edge case) we might be
pulling the whole data into memory (which can be mitigated by incrementally loading it or
holding a index in the filter and doesn't exist in the current patch).

there could be more which i overlooked...
                
> Query cache
> -----------
>
>                 Key: CASSANDRA-5357
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>            Assignee: Vijay
>
> I think that most people expect the row cache to act like a query cache, because that's
a reasonable model.  Caching the entire partition is, in retrospect, not really reasonable,
so it's not surprising that it catches people off guard, especially given the confusion we've
inflicted on ourselves as to what a "row" constitutes.
> I propose replacing it with a true query cache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message