cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yan Chunlu <>
Subject how large cassandra could scale when it need to do manual operation?
Date Fri, 08 Jul 2011 15:50:42 GMT
hi, all:
I am curious about how large that Cassandra can scale?

from the information I can get, the largest usage is at facebook, which is
about 150 nodes.  in the mean time they are using 2000+ nodes with Hadoop,
and yahoo even using 4000 nodes of Hadoop.

I am not understand why is the situation, I only have  little knowledge with
Cassandra and even no knowledge with Hadoop.

currently I am using cassandra with 3 nodes and having problem bring one
back after it out of sync, the problems I encountered making me worry about
how cassandra could scale out:

1):  the load balance need to manually performed on every node, according

def tokens(nodes):

for x in xrange(nodes):

print 2 ** 127 / nodes * x

2): when adding new nodes, need to perform node repair and cleanup on every

3) when decommission a node, there is a chance that slow down the entire
cluster. (not sure why but I saw people ask around about it.) and the only
way to do is shutdown the entire the cluster, rsync the data, and start all
nodes without the decommission one.

after all, I think there is alot of human work to do to maintain the cluster
which make it impossible to scale to thousands of nodes, but I hope I am
totally wrong about all of this, currently I am serving 1 millions pv every
day with Cassandra and it make me feel unsafe, I am afraid one day one node
crash will cause the data broken and all cluster goes wrong....

in the contrary, relational database make me feel safety but it does not
scale well.

thanks for any guidance here.

View raw message