cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-5357) Query cache / partition head cache
Date Tue, 28 Jan 2014 11:20:53 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884030#comment-13884030
] 

Sylvain Lebresne commented on CASSANDRA-5357:
---------------------------------------------

A bunch of rermaks on the attached patch:
* I don't think we can count CQL rows as the patch does. Namely, we can't rely on 'the number
of cells' * 'the number of declared CQL columns' as being the number of CQL rows, since many
rows may not have all columns set. Typically, the cacheFilter in getThroughCache might finish
in the middle of a CQL row (since it blindly asks for cells, not CQL rows) which would be
bad. Also, for the same reason, cacheFilter might fetch less CQL rows than requested by the
user when we use it to query data. Lastly, 'the number of declared CQL columns' can change
at runtime, which would broke the cache (for that last one, we could invalidate the cache
when new columns are added/removed, but it's better if we don't have to).
So anyway, we really should always count CQL rows, which implies using SliceQueryFilter.lastCounted()
and ColumnCounter. This does mean that unless we actually store the row count with each cached
CF object, we might have to re-count every time we have a cache hit, but this is probably
ok for now (we can optimize later).
* In CFS.getThroughCache, on a cache hit, I believe the condition to return the cached value
should be something like:
{noformat}
if (wholePartitionCached || (filter.isHeadFilter() && filter.filter.getCount() * (metadata.regularColumns().size()
+ 1)  <= cachedCf.getColumnCount()))
{noformat}
i.e. unless the whole partition is cached we should not return the cached value for anything
other than a head filter.
* In CFS.getThroughCache, on a cache miss and in the case where {{filter.filter.getCount()
> metadata.getRowsPerPartitionToCache().rowsToCache}}, we shouldn't cache the result if
the filter 'finish' is not empty and the query has yield less rows than the cache capacity.
* From a naming point of view, I'll note that rows_per_partition_to_cache is probably confusing
on the thrift side.  Maybe for thrift it's worth naming it something like columns_per_rows_to_cache.
 It won't be entirely correct technically because internally we'll store a number of CQL rows
per partitions, but in most cases this will be the same thing anyway so it's maybe fine.
* For SliceQueryFilter.isHeadFilter, we want to exclude queries with multiple ColumnSlice
for now, and while the implementation of the patch does that, it looks more like a side effect
than the actual intent. I think something like
{noformat}
public boolean isHeadFilter()
{
    return slices.length == 1 && slice[0].start.isEmpty();
}
{noformat}
would make the intent more clear.
* In RowIteratorFactory, we could keep using the cache if we cache ALL. Or even really if
the cache is big enough to satisfy the filter, like we do in getThroughCache.
* In ColumnFamilyMetrics, any reason why only rowCacheHitOutOfRange is removed on release()?
* Nit: seems like shouldCache() in NamesQueryFilter is not used.

As a side note, maybe this is a good time to rename the 'row cache' into 'partition cache'
(I'm thinking options in cassandra.yaml, metrics names, ...)? Could be done in a separate
issue though.



> Query cache / partition head cache
> ----------------------------------
>
>                 Key: CASSANDRA-5357
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Marcus Eriksson
>             Fix For: 2.1
>
>         Attachments: 0001-Cache-a-configurable-amount-of-columns.patch
>
>
> I think that most people expect the row cache to act like a query cache, because that's
a reasonable model.  Caching the entire partition is, in retrospect, not really reasonable,
so it's not surprising that it catches people off guard, especially given the confusion we've
inflicted on ourselves as to what a "row" constitutes.
> I propose replacing it with a true query cache.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message