cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dan Hendry" <>
Subject RE: question about replicas & dynamic response to load
Date Fri, 04 Mar 2011 01:07:27 GMT
To some extent, the boot-strapping problem will be an issue with most
solutions: the data has to be duplicated from somewhere. Bootstrapping
should not cause much performance degradation unless you are already pushing
capacity limits. It's the decommissioning problem which makes Cassandra
somewhat problematic in your case. You grow your cluster x5 then write to
it. You have to perform a proper decommission when shrinking the cluster
again which involves validating and streaming data to the remaining
replicas: a fairly serious operation with TBs of data. For most realistic
situations, unless the cluster is completely read-only, you cant just kill
most of the nodes in the cluster.

I cant really think of a good, general, way to do this with just Cassandra
although there may be some hacktastical possibilities. I think a more
statically sized Cassandra cluster then a variable cache layer (memcached or
similar) is probably a better solution. This option kind of falls apart at
the terabytes of data range. 

Have you considered using S3, Amazon cloud front or some other CDN instead
of rolling your own solution? For immutable data, its what they excel at.
Cassandra has amazing write capacity and its design focus is on scaling
writes. I would not really consider it a good tool for the job of serving
massive amounts of static content.


-----Original Message-----
From: Shaun Cutts [] 
Sent: March-03-11 13:00
Subject: question about replicas & dynamic response to load


In our project our usage pattern is likely to be quite variable -- high for
a a few days, then lower, etc could vary as much (or more) as 10x from peak
to "non-peak". Also, much of our data is immutable -- but there is a
considerable amount of it -- perhaps in the single digit TBs. Finally, we
are hosting with amazon.

I'm looking for advice on how to vary the number of nodes dynamically, in
order to reduce our hosting costs at non-peak times. I worry that just
adding "new" nodes in response to demand will make things worse -- at least
temporarily -- as the new node copies data to itself; then bringing it down
will also cause a degradation.

I'm wondering if it is possible to bring up exact copies of other nodes? Or
alternately to take down a populated node containing (only?) immutable data,
then bring it up again when the need arises?

Are there reference/reading materials(/blogs) concerning dynamically varying
number of nodes in response to demand?


-- Shaun

No virus found in this incoming message.
Checked by AVG - 
Version: 9.0.872 / Virus Database: 271.1.1/3479 - Release Date: 03/03/11

View raw message