incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: Cache Row Size
Date Fri, 17 Feb 2012 15:05:18 GMT
Interesting.  I'm not sure what to do with that information, but interesting. :)

2012/1/16 Todd Burruss <bburruss@expedia.com>:
> I did a little more digging and a lot of the "overhead" I see in the cache
> is from the usage of ByteBuffer.  Each ByteBuffer takes 48 bytes,
> regardless of the data it represents.  so for a single IColumn stored in
> the cache, 96 bytes (one for name, one for value) are for ByteBuffer's
> needs.
>
> converting to byte[] would save a significant chunk of memory.  however I
> know the investment in ByteBuffer is significant.  creating a cache
> provider that persists the values as byte[] instead of ByteBuffer is easy,
> somewhat like the Serializing cache provider, by creating a copy of the
> row on "put".  however, saving the keys as byte[] instead of ByteBuffer
> runs a bit deeper through the code.  not sure if I want to go there.
>
> since I am randomly accessing the columns within wide rows, I need *all*
> the rows to be cached to get good performance. this is the reason for my
> desire to save as much RAM as possible.  according to my calculations, if
> convert to byte[] this will save nearly 8gb of RAM out of the approx 25gb
> the cache is currently using.
>
> the easy fix is to simply buy more RAM and/or more machines, but wanted to
> get any feedback to see if there's something to my findings.
>
> thx
>
> fyi ... I also created some cache providers using Ehcache and
> LinkedHashMap and both exhibit about the same memory usage (in my use
> case) as ConcurrentLinkedHashCache.
>
>
>
>
> On 1/12/12 9:02 PM, "Jonathan Ellis" <jbellis@gmail.com> wrote:
>
>>The serializing cache is basically optimal.  Your problem is really
>>that row cache is not designed for wide rows at all.  See
>>https://issues.apache.org/jira/browse/CASSANDRA-1956
>>
>>On Thu, Jan 12, 2012 at 10:46 PM, Todd Burruss <bburruss@expedia.com>
>>wrote:
>>> after looking through the code it seems fairly straight forward to
>>>create
>>> some different cache providers and try some things.
>>>
>>> has anyone tried ehcache w/o persistence?  I see this JIRA
>>> https://issues.apache.org/jira/browse/CASSANDRA-1945 but the main
>>> complaint was the disk serialization, which I don't think anyone wants.
>>>
>>>
>>> On 1/12/12 6:18 PM, "Jonathan Ellis" <jbellis@gmail.com> wrote:
>>>
>>>>8x is pretty normal for JVM and bookkeeping overhead with the CLHCP.
>>>>
>>>>The SerializedCacheProvider is the default in 1.0 and is much
>>>>lighter-weight.
>>>>
>>>>On Thu, Jan 12, 2012 at 6:07 PM, Todd Burruss <bburruss@expedia.com>
>>>>wrote:
>>>>> I'm using ConcurrentLinkedHashCacheProvider and my data on disk is
>>>>>about 4gb, but the RAM used by the cache is around 25gb.  I have 70k
>>>>>columns per row, and only about 2500 rows ­ so a lot more columns than
>>>>>rows.  has there been any discussion or JIRAs discussing reducing the
>>>>>size of the cache?  I can understand the overhead for column names,
>>>>>etc,
>>>>>but the ratio seems a bit distorted.
>>>>>
>>>>> I'm tracing through the code, so any pointers to help me understand is
>>>>>appreciated
>>>>>
>>>>> thx
>>>>
>>>>
>>>>
>>>>--
>>>>Jonathan Ellis
>>>>Project Chair, Apache Cassandra
>>>>co-founder of DataStax, the source for professional Cassandra support
>>>>http://www.datastax.com
>>>
>>
>>
>>
>>--
>>Jonathan Ellis
>>Project Chair, Apache Cassandra
>>co-founder of DataStax, the source for professional Cassandra support
>>http://www.datastax.com
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Mime
View raw message