incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Faraaz Sareshwala <fsareshw...@quantcast.com>
Subject Re: row cache
Date Fri, 23 Aug 2013 02:53:06 GMT
After a bit of searching, I think I've found the answer I've been looking for. I guess I didn't
search hard enough before sending out this email. Thank you all for the responses.

According to the datastax documentation [1], there are two types of row cache providers:

row_cache_provider
(Default: SerializingCacheProvider) Specifies what kind of implementation to use for the row
cache.
SerializingCacheProvider: Serializes the contents of the row and stores it in native memory,
that is, off the JVM Heap. Serialized rows take significantly less memory than live rows in
the JVM, so you can cache more rows in a given memory footprint. Storing the cache off-heap
means you can use smaller heap sizes, which reduces the impact of garbage collection pauses.
It is valid to specify the fully-qualified class name to a class that implementsorg.apache.cassandra.cache.IRowCacheProvider.
ConcurrentLinkedHashCacheProvider: Rows are cached using the JVM heap, providing the same
row cache behavior as Cassandra versions prior to 0.8.

The SerializingCacheProvider is 5 to 10 times more memory-efficient than ConcurrentLinkedHashCacheProvider
for applications that are not blob-intensive. However, SerializingCacheProvider may perform
worse in update-heavy workload situations because it invalidates cached rows on update instead
of updating them in place as ConcurrentLinkedHashCacheProvider does.


The off-heap row cache provider does indeed invalidate rows. We're going to look into using
the ConcurrentLinkedHashCacheProvider. Time to read some source code! :)

Faraaz

[1] http://www.datastax.com/documentation/cassandra/1.2/webhelp/cassandra/configuration/configCassandra_yaml_r.html#reference_ds_qfg_n1r_1k__row_cache_provider




On Thursday, August 22, 2013 at 7:40 PM, Boris Yen wrote:

> If you are using off-heap memory for row cache, "all writes invalidate the entire row"
should be correct.
> 
> Boris
> 
> 
> On Fri, Aug 23, 2013 at 8:32 AM, Robert Coli <rcoli@eventbrite.com (mailto:rcoli@eventbrite.com)>
wrote:
> > On Wed, Aug 14, 2013 at 10:56 PM, Faraaz Sareshwala <fsareshwala@quantcast.com
(mailto:fsareshwala@quantcast.com)> wrote:
> > > All writes invalidate the entire row (updates thrown out the cached row)
> > This is not correct. Writes are added to the row, if it is in the row cache. If
it's not in the row cache, the row is not added to the cache. 
> >  
> > Citation from jbellis on stackoverflow, because I don't have time to find a better
one and the code is not obvious about it :
> > 
> > http://stackoverflow.com/a/12499422 
> > 
> > > I have yet to go through the source code for the row cache. I do plan to do
that. Can someone point me to documentation on the row cache internals? All I've found online
so far is small discussion about it and how to enable it. 
> > 
> > There is no such documentation, or at least if it exists I am unaware of it.
> > 
> > In general, the rule of thumb is that the Row Cache should not be used unless the
rows in question are : 
> > 
> > 1) Very hot in terms of access
> > 2) Uniform in size
> > 3) "Small"
> > 
> > =Rob  


Mime
View raw message