cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maxim Kramarenko <>
Subject Re: Cassandra CF sharding
Date Fri, 28 May 2010 13:05:11 GMT

Thank you.

In 1) I hope, that processing smaller files will be more easy to 
monitor. Also, if we have disk failure, we can delete just one file and 
repair, for example. Actually, CF per customer will be the best (easy to 
delete/backup specified customer data only, customers are totally 
independent), but Cassandra likely doesn't support 15000 CF per Keyspace.

Regarding 3) - yes, I understand.

One related question there - if we can choose, should we prefer
5 nodes, 16 cores/16 GB/8 TB disk space each
10 nodes, 8 cores/8 GB/4 TB disk space each ?

When it worth to use multiple Cassandra instance per node ? We run now 6 
instances on 3 nodes, and it works much better, than 3 instances on the 
same 3 nodes. Is it rule or exception ?

On 28.05.2010 07:11, Jonathan Ellis wrote:
> 2) is correct, but for 1) I'm not sure what manageability improvements
> you anticipate from dealing with multiple entities instead of one.
> I'm not sure what you're thinking of for 3) but routing is done by key
> only.
> 2010/5/27 Maxim Kramarenko<>:
>> Hello!
>> We have mail archive with one large CF for mail body. In our case, it's easy
>> to shard data to 5-10 CF by customer id. We like to do this because:
>> 1) We get more manageable instances, because we have many small CF instead
>> of one multi-TB CF on each node.
>> 2) Better disk space usage (need to reserve 50% of the largest shard for
>> compaction only)
>> 3) Can manage node load not by token only, but also by defining shards
>> available per node.
>> Is my assumptions correct ? Any negative side effects ?

View raw message