cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Doubleday (Commented) (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-2864) Alternative Row Cache Implementation
Date Thu, 29 Mar 2012 12:46:32 GMT


Daniel Doubleday commented on CASSANDRA-2864:

So ... some final remarks

We have this in production and it's looking good so far.
Our cached real world cfs are pretty skinny so far so the reduction in memsize is only ~ 3.5
- 4x.

Latency wise there's no difference (compared to CLHC) keeping the max number of items equal.
So the improvement comes from being able to keep more rows in memory and therefor increase
hit ratio or leave more mem for page cache.

If there's any interest in this: the fork we are running lives here:

This is still work in progress which works for us (counters and supercolumns are untested)
and allows to switch implementation via startup params.

Again the intention of this patch is to replace both CLHC and SC. 
Silvain expressed concerns that this might not work for counters. Since we don't use them
I didn't bother to much (at least until it's clear whether this is interesting for you or

Next step for me is to port to 1.1 and look at the key cache.

Please comment if you want to follow up or close otherwise.

> Alternative Row Cache Implementation
> ------------------------------------
>                 Key: CASSANDRA-2864
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Daniel Doubleday
>            Assignee: Daniel Doubleday
>            Priority: Minor
>         Attachments: rowcache.patch
> we have been working on an alternative implementation to the existing row cache(s)
> We have 2 main goals:
> - Decrease memory -> get more rows in the cache without suffering a huge performance
> - Reduce gc pressure
> This sounds a lot like we should be using the new serializing cache in 0.8. 
> Unfortunately our workload consists of loads of updates which would invalidate the cache
all the time.
> The second unfortunate thing is that the idea we came up with doesn't fit the new cache
provider api...
> It looks like this:
> Like the serializing cache we basically only cache the serialized byte buffer. we don't
serialize the bloom filter and try to do some other minor compression tricks (var ints etc
not done yet). The main difference is that we don't deserialize but use the normal sstable
iterators and filters as in the regular uncached case.
> So the read path looks like this:
> return filter.collectCollatedColumns(memtable iter, cached row iter)
> The write path is not affected. It does not update the cache
> During flush we merge all memtable updates with the cached rows.
> The attached patch is based on 0.8 branch r1143352
> It does not replace the existing row cache but sits aside it. Theres environment switch
to choose the implementation. This way it is easy to benchmark performance differences.
> -DuseSSTableCache=true enables the alternative cache. It shares its configuration with
the standard row cache. So the cache capacity is shared. 
> We have duplicated a fair amount of code. First we actually refactored the existing sstable
filter / reader but than decided to minimize dependencies. Also this way it is easy to customize
serialization for in memory sstable rows. 
> We have also experimented a little with compression but since this task at this stage
is mainly to kick off discussion we wanted to keep things simple. But there is certainly room
for optimizations.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message