incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: Alternative Row Cache Implementation
Date Fri, 01 Jul 2011 01:46:00 GMT
I'm interested. :)

On Thu, Jun 30, 2011 at 11:44 AM, Daniel Doubleday
<daniel.doubleday@gmx.net> wrote:
> Hi all - or rather devs
>
> we have been working on an alternative implementation to the existing row cache(s)
>
> We have 2 main goals:
>
> - Decrease memory -> get more rows in the cache without suffering a huge performance
penalty
> - Reduce gc pressure
>
> This sounds a lot like we should be using the new serializing cache in 0.8.
> Unfortunately our workload consists of loads of updates which would invalidate the cache
all the time.
>
> The second unfortunate thing is that the idea we came up with doesn't fit the new cache
provider api...
>
> It looks like this:
>
> Like the serializing cache we basically only cache the serialized byte buffer. we don't
serialize the bloom filter and try to do some other minor compression tricks (var ints etc
not done yet). The main difference is that we don't deserialize but use the normal sstable
iterators and filters as in the regular uncached case.
>
> So the read path looks like this:
>
> return filter.collectCollatedColumns(memtable iter, cached row iter)
>
> The write path is not affected. It does not update the cache
>
> During flush we merge all memtable updates with the cached rows.
>
> These are early test results:
>
> - Depending on row width and value size the serialized cache takes between 30% - 50%
of memory compared with cached CF. This might be optimized further
> - Read times increase by 5 - 10%
>
> We haven't tested the effects on gc but hope that we will see improvements there because
we only cache a fraction of objects (in terms of numbers) in old gen heap which should make
gc cheaper. Of course there's also the option to use native mem like serializing cache does.
>
> We believe that this approach is quite promising but as I said it is not compatible with
the current cache api.
>
> So my question is: does that sound interesting enough to open a jira or has that idea
already been considered and rejected for some reason?
>
> Cheers,
> Daniel
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Mime
View raw message