incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jose Juarez-Comboni <jmj...@yahoo.com>
Subject Re: Sizing a Cassandra cluster
Date Thu, 24 Mar 2011 21:04:13 GMT

Aaron,

How did you get to 1280 writes/sec? Counting 64 writers each taking 5ms for a write cycle,
assuming real parallel access with no speed hits, I get 12,800 writes/sec. Am I missing something?

From Jose's iPhone

On Mar 24, 2011, at 2:52 PM, aaron morton <aaron@thelastpickle.com> wrote:

> Big old guess of something in the 1000's. 
> 
> Try benchmarking your work load and plug the numbers (my 5m is pretty high) in...
> 
> - 8 cores * 8 writers per core = 64 if each write request takes 5ms  = 1280 max per sec
> - 1 spindle * 16 readers per spindle = 16 readers if each read request takes 5ms =  320
max per sec
> (reader and writer sizes from the help in conf/cassandra.yaml)
> 
> This is really just a guess, there are a lot more things going on in the system and it
gets even more complicated once it's turned on. But I know sometimes you just need to show
you've thought about it :)
> 
> Hope that helps.
> Aaron
> 
> On 25 Mar 2011, at 02:27, Brian Fitzpatrick wrote:
> 
>> Thanks for the tips on the replication factor.  Any thoughts on the
>> number of nodes in a cluster to support an RF=3 with a workload of 400
>> ops/sec (4-8K sized rows, 50/50 read/write)?  Based on the "sweet
>> spot" hardware referenced in the wiki (8-core, 16-32GB RAM), what kink
>> of ops/sec could I reasonably expect from each node.  Just looking for
>> a range to make some educated guesses.
>> 
>> Thanks,
>> Brian
>> 
>> On Wed, Mar 23, 2011 at 9:04 PM, aaron morton <aaron@thelastpickle.com> wrote:
>>> It really does depend on what your workload is like, and in the end will
>>> involve a certain amount of fudge factor.
>>> 
>>> http://wiki.apache.org/cassandra/CassandraHardware provides some guidance.
>>> http://wiki.apache.org/cassandra/MemtableThresholds can be used to get a
>>> rough idea of the memory requirements. Note that secondary indexes are also
>>> CF's with the same memory settings as the parent.
>>> With RF3 you can lose afford to lose one replica for a key a token range and
>>> still be available (Assuming Quorum CL). With RF 5 you can lose 2 replicas
>>> and still be available for the keys in the range.
>>> I'm been careful to say "lose X replicas" because the other nodes in the
>>> cluster don't count when considering an operation for a key. Two examples, 9
>>> node cluster with RF3. If you lose nodes 2 and 3 and they are replicas for
>>> node 1, Quorum operations on keys in the range for node 1 will fail (ranges
>>> for 2 and 3 will be ok). If you lose nodes 2 and 5 Quorum operations will
>>> succeed for all keys.
>>> RF 3 is reasonable starting point for some redundancy, RF 5 is more. After
>>> that it's Web Scale (tm).
>>> Hope that helps
>>> Aaron
>>> 
>>> On 24 Mar 2011, at 04:04, Brian Fitzpatrick wrote:
>>> 
>>> I'm going through the process of specing out the hardware for a
>>> Cassandra cluster. The relevant specs:
>>> 
>>> - Support 460 operations/sec (50/50 read/write workload). Row size
>>> ranges from 4 to 8K.
>>> - Support 29 million objects for the first year
>>> - Support 365 GB storage for the first year, based on Cassandra tests
>>> (data + index + overhead * replication factor of 3)
>>> 
>>> I'm looking for advice on the node size for this cluster, recommended
>>> RAM per node, and whether RF=3 seems to be a good choice for general
>>> availability and resistance to failure.
>>> 
>>> I've looked at the YCSB benchmark paper and through the archives of
>>> this email list looking for pointers.  I haven't found any general
>>> guidelines on recommended cluster size to support X operations/sec
>>> with Y data size at RF factor of Z, that I could extrapolate from.
>>> 
>>> Any and all recommendations appreciated.
>>> 
>>> Thanks,
>>> Brian
>>> 
>>> 
> 

Mime
View raw message