From Tom Chen <>
Subject Re: cassandra not responding
Date Tue, 16 Mar 2010 20:51:51 GMT
Can you give some details about the use case that you are using cassandra
for? I am actually looking to store almost the data in the same manner,
except with more of a variance in data  1k to 5k with about 20 million

I have been benchmarking cassandra on 5 verses 6, and v6 has significant
speed improvements if I hit the cache (obviously memory access verses random
disk.)  Write performance in either version is pretty damn good.


On Tue, Mar 16, 2010 at 1:40 PM, B. Todd Burruss <> wrote:

> i only anticipate about 2,000,000 hot rows, each with about 4k of data.
>  however, we will have a LOT of rows that just aren't used.  right now, the
> data is just one column with a blob of text in it.  but i have new data
> coming in constantly, so not sure how this affects the cache, etc.  i'm
> skeptical about using any cache really, and just rely on the OS (as you
> mentioned.)  i've been trying this out to see if there's a performance gain
> somewhere, but i'm not seeing it.
> Nathan McCall wrote:
>> The cache is a "second-chance FIFO" from this library:
>> That sounds like an awful lot of churn given the size of the queue and
>> the number of references it might keep for the second-chance stuff.
>> How big of a hot data set do you need to maintain? The amount of
>> overhead for such a large record set may not buy you anything over
>> just relying on the file system cache and turning down the heap size.
>> -Nate
>> On Tue, Mar 16, 2010 at 1:17 PM, B. Todd Burruss <>
>> wrote:
>>> i think i better make sure i understand how the row/key cache works.  i
>>> currently have both set to 10%.  so if cassandra needs to read data from
>>> an
>>> sstable that has 100 million rows, it will cache 10,000,000 rows of data
>>> from that sstable?  so if my row is ~4k, then we're looking at ~40gb used
>>> by
>>> cache?

