cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <>
Subject Re: Cache Row Size
Date Fri, 17 Feb 2012 15:05:18 GMT
Interesting.  I'm not sure what to do with that information, but interesting. :)

2012/1/16 Todd Burruss <>:
> I did a little more digging and a lot of the "overhead" I see in the cache
> is from the usage of ByteBuffer.  Each ByteBuffer takes 48 bytes,
> regardless of the data it represents.  so for a single IColumn stored in
> the cache, 96 bytes (one for name, one for value) are for ByteBuffer's
> needs.
> converting to byte[] would save a significant chunk of memory.  however I
> know the investment in ByteBuffer is significant.  creating a cache
> provider that persists the values as byte[] instead of ByteBuffer is easy,
> somewhat like the Serializing cache provider, by creating a copy of the
> row on "put".  however, saving the keys as byte[] instead of ByteBuffer
> runs a bit deeper through the code.  not sure if I want to go there.
> since I am randomly accessing the columns within wide rows, I need *all*
> the rows to be cached to get good performance. this is the reason for my
> desire to save as much RAM as possible.  according to my calculations, if
> convert to byte[] this will save nearly 8gb of RAM out of the approx 25gb
> the cache is currently using.
> the easy fix is to simply buy more RAM and/or more machines, but wanted to
> get any feedback to see if there's something to my findings.
> thx
> fyi ... I also created some cache providers using Ehcache and
> LinkedHashMap and both exhibit about the same memory usage (in my use
> case) as ConcurrentLinkedHashCache.
> On 1/12/12 9:02 PM, "Jonathan Ellis" <> wrote:
>>The serializing cache is basically optimal.  Your problem is really
>>that row cache is not designed for wide rows at all.  See
>>On Thu, Jan 12, 2012 at 10:46 PM, Todd Burruss <>
>>> after looking through the code it seems fairly straight forward to
>>> some different cache providers and try some things.
>>> has anyone tried ehcache w/o persistence?  I see this JIRA
>>> but the main
>>> complaint was the disk serialization, which I don't think anyone wants.
>>> On 1/12/12 6:18 PM, "Jonathan Ellis" <> wrote:
>>>>8x is pretty normal for JVM and bookkeeping overhead with the CLHCP.
>>>>The SerializedCacheProvider is the default in 1.0 and is much
>>>>On Thu, Jan 12, 2012 at 6:07 PM, Todd Burruss <>
>>>>> I'm using ConcurrentLinkedHashCacheProvider and my data on disk is
>>>>>about 4gb, but the RAM used by the cache is around 25gb.  I have 70k
>>>>>columns per row, and only about 2500 rows ­ so a lot more columns than
>>>>>rows.  has there been any discussion or JIRAs discussing reducing the
>>>>>size of the cache?  I can understand the overhead for column names,
>>>>>but the ratio seems a bit distorted.
>>>>> I'm tracing through the code, so any pointers to help me understand is
>>>>> thx
>>>>Jonathan Ellis
>>>>Project Chair, Apache Cassandra
>>>>co-founder of DataStax, the source for professional Cassandra support
>>Jonathan Ellis
>>Project Chair, Apache Cassandra
>>co-founder of DataStax, the source for professional Cassandra support

Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support

View raw message