incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <>
Subject Re: question about replicas & dynamic response to load
Date Sun, 06 Mar 2011 03:52:42 GMT
Agree. Cassandra generally assumes a reasonable static cluster membership. There are some tricks
that can be done with copying SSTables but they will only reduce the need to stream data around,
not eliminate it.

This may not suit your problem domain but, speaking of the AWS infrastructure how about using
the SQS messaging service (or similar e.g. RabbitMQ) to smooth out your throughput ? You could
then throttle the inserts into the cassandra cluster to a maximum level and spec your HW against
that. During peak the message queue can soak up the overflow. 

Hope that helps. 

On 4/03/2011, at 2:07 PM, Dan Hendry wrote:

> To some extent, the boot-strapping problem will be an issue with most
> solutions: the data has to be duplicated from somewhere. Bootstrapping
> should not cause much performance degradation unless you are already pushing
> capacity limits. It's the decommissioning problem which makes Cassandra
> somewhat problematic in your case. You grow your cluster x5 then write to
> it. You have to perform a proper decommission when shrinking the cluster
> again which involves validating and streaming data to the remaining
> replicas: a fairly serious operation with TBs of data. For most realistic
> situations, unless the cluster is completely read-only, you cant just kill
> most of the nodes in the cluster.
> I cant really think of a good, general, way to do this with just Cassandra
> although there may be some hacktastical possibilities. I think a more
> statically sized Cassandra cluster then a variable cache layer (memcached or
> similar) is probably a better solution. This option kind of falls apart at
> the terabytes of data range. 
> Have you considered using S3, Amazon cloud front or some other CDN instead
> of rolling your own solution? For immutable data, its what they excel at.
> Cassandra has amazing write capacity and its design focus is on scaling
> writes. I would not really consider it a good tool for the job of serving
> massive amounts of static content.
> Dan
> -----Original Message-----
> From: Shaun Cutts [] 
> Sent: March-03-11 13:00
> To:
> Subject: question about replicas & dynamic response to load
> Hello,
> In our project our usage pattern is likely to be quite variable -- high for
> a a few days, then lower, etc could vary as much (or more) as 10x from peak
> to "non-peak". Also, much of our data is immutable -- but there is a
> considerable amount of it -- perhaps in the single digit TBs. Finally, we
> are hosting with amazon.
> I'm looking for advice on how to vary the number of nodes dynamically, in
> order to reduce our hosting costs at non-peak times. I worry that just
> adding "new" nodes in response to demand will make things worse -- at least
> temporarily -- as the new node copies data to itself; then bringing it down
> will also cause a degradation.
> I'm wondering if it is possible to bring up exact copies of other nodes? Or
> alternately to take down a populated node containing (only?) immutable data,
> then bring it up again when the need arises?
> Are there reference/reading materials(/blogs) concerning dynamically varying
> number of nodes in response to demand?
> Thanks!
> -- Shaun
> No virus found in this incoming message.
> Checked by AVG - 
> Version: 9.0.872 / Virus Database: 271.1.1/3479 - Release Date: 03/03/11
> 02:34:00

View raw message