cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: how large cassandra could scale when it need to do manual operation?
Date Fri, 08 Jul 2011 21:22:58 GMT
AFAIK Facebook Cassandra and Apache Cassandra diverged paths a long time ago. Twitter is a
vocal supporter with a large Apache Cassandra install, e.g. "Twitter currently runs a couple
hundred Cassandra nodes across a half dozen clusters. " http://www.datastax.com/2011/06/chris-goffinet-of-twitter-to-speak-at-cassandra-sf-2011


If you are working with a 3 node cluster removing/rebuilding/what ever one node will effect
33% of your capacity. When you scale up the contribution from each individual node goes down,
and the impact of one node going down is less. Problems that happen with a few nodes will
go away at scale, to be replaced by a whole set of new ones.   

> 1):  the load balance need to manually performed on every node, according to: 

Yes
	
> 2): when adding new nodes, need to perform node repair and cleanup on every node 
You only need to run cleanup, see http://wiki.apache.org/cassandra/Operations#Bootstrap

> 3) when decommission a node, there is a chance that slow down the entire cluster. (not
sure why but I saw people ask around about it.) and the only way to do is shutdown the entire
the cluster, rsync the data, and start all nodes without the decommission one. 

I cannot remember any specific cases where decommission requires a full cluster stop, do you
have a link? With regard to slowing down, the decommission process will stream data from the
node you are removing onto the other nodes this can slow down the target node (I think it's
more intelligent now about what is moved). This will be exaggerated in a 3 node cluster as
you are removing 33% of the processing and adding some (temporary) extra load to the remaining
nodes. 

> after all, I think there is alot of human work to do to maintain the cluster which make
it impossible to scale to thousands of nodes, 
Automation, Automation, Automation is the only way to go. 

Chef, Puppet, CF Engine for general config and deployment; Cloud Kick, munin, ganglia etc
for monitoring. And 
Ops Centre (http://www.datastax.com/products/opscenter) for cassandra specific management.

> I am totally wrong about all of this, currently I am serving 1 millions pv every day
with Cassandra and it make me feel unsafe, I am afraid one day one node crash will cause the
data broken and all cluster goes wrong....
With RF3 and a 3Node cluster you have room to lose one node and the cluster will be up for
100% of the keys. While better than having to worry about *the* database server, it's still
entry level fault tolerance. With RF 3 in a 6 Node cluster you can lose up to 2 nodes and
still be up for 100% of the keys. 

Is there something you are specifically concerned about with your current installation ? 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 8 Jul 2011, at 08:50, Yan Chunlu wrote:

> hi, all:
> I am curious about how large that Cassandra can scale? 
> 
> from the information I can get, the largest usage is at facebook, which is about 150
nodes.  in the mean time they are using 2000+ nodes with Hadoop, and yahoo even using 4000
nodes of Hadoop. 
> 
> I am not understand why is the situation, I only have  little knowledge with Cassandra
and even no knowledge with Hadoop. 
> 
> 
> 
> currently I am using cassandra with 3 nodes and having problem bring one back after it
out of sync, the problems I encountered making me worry about how cassandra could scale out:

> 
> 1):  the load balance need to manually performed on every node, according to: 
> 
> def tokens(nodes): 
> 
> for x in xrange(nodes): 
> 
> print 2 ** 127 / nodes * x 
> 
> 
> 
> 2): when adding new nodes, need to perform node repair and cleanup on every node 
> 
> 
> 
> 3) when decommission a node, there is a chance that slow down the entire cluster. (not
sure why but I saw people ask around about it.) and the only way to do is shutdown the entire
the cluster, rsync the data, and start all nodes without the decommission one. 
> 
> 
> 
> 
> 
> after all, I think there is alot of human work to do to maintain the cluster which make
it impossible to scale to thousands of nodes, but I hope I am totally wrong about all of this,
currently I am serving 1 millions pv every day with Cassandra and it make me feel unsafe,
I am afraid one day one node crash will cause the data broken and all cluster goes wrong....

> 
> 
> 
> in the contrary, relational database make me feel safety but it does not scale well.

> 
> 
> 
> thanks for any guidance here.
> 


Mime
View raw message