incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Chen <...@gogii.net>
Subject Re: cassandra not responding
Date Tue, 16 Mar 2010 20:51:51 GMT
Can you give some details about the use case that you are using cassandra
for? I am actually looking to store almost the data in the same manner,
except with more of a variance in data  1k to 5k with about 20 million
rows.

I have been benchmarking cassandra on 5 verses 6, and v6 has significant
speed improvements if I hit the cache (obviously memory access verses random
disk.)  Write performance in either version is pretty damn good.


Tom


On Tue, Mar 16, 2010 at 1:40 PM, B. Todd Burruss <bburruss@real.com> wrote:

> i only anticipate about 2,000,000 hot rows, each with about 4k of data.
>  however, we will have a LOT of rows that just aren't used.  right now, the
> data is just one column with a blob of text in it.  but i have new data
> coming in constantly, so not sure how this affects the cache, etc.  i'm
> skeptical about using any cache really, and just rely on the OS (as you
> mentioned.)  i've been trying this out to see if there's a performance gain
> somewhere, but i'm not seeing it.
>
>
> Nathan McCall wrote:
>
>> The cache is a "second-chance FIFO" from this library:
>>
>> http://code.google.com/p/concurrentlinkedhashmap/source/browse/trunk/src/java/com/reardencommerce/kernel/collections/shared/evictable/ConcurrentLinkedHashMap.java
>>
>> That sounds like an awful lot of churn given the size of the queue and
>> the number of references it might keep for the second-chance stuff.
>> How big of a hot data set do you need to maintain? The amount of
>> overhead for such a large record set may not buy you anything over
>> just relying on the file system cache and turning down the heap size.
>>
>> -Nate
>>
>> On Tue, Mar 16, 2010 at 1:17 PM, B. Todd Burruss <bburruss@real.com>
>> wrote:
>>
>>
>>> i think i better make sure i understand how the row/key cache works.  i
>>> currently have both set to 10%.  so if cassandra needs to read data from
>>> an
>>> sstable that has 100 million rows, it will cache 10,000,000 rows of data
>>> from that sstable?  so if my row is ~4k, then we're looking at ~40gb used
>>> by
>>> cache?
>>>
>>>
>>>
>>

Mime
View raw message