Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D04264FB4 for ; Sat, 9 Jul 2011 07:18:36 +0000 (UTC) Received: (qmail 62477 invoked by uid 500); 9 Jul 2011 07:18:32 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 62207 invoked by uid 500); 9 Jul 2011 07:18:24 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 62190 invoked by uid 99); 9 Jul 2011 07:18:21 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 09 Jul 2011 07:18:21 +0000 X-ASF-Spam-Status: No, hits=2.8 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL,URI_HEX X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of springrider@gmail.com designates 209.85.215.172 as permitted sender) Received: from [209.85.215.172] (HELO mail-ey0-f172.google.com) (209.85.215.172) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 09 Jul 2011 07:18:14 +0000 Received: by eye13 with SMTP id 13so1011622eye.31 for ; Sat, 09 Jul 2011 00:17:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=a6VuOlRahzfk8g+XWnhnEWJdnQcUQyAkLQJqwuU/wqU=; b=aC6CwmUPfe1jdgi+FQuxmLKlp+lpQp8eo49lKyO5DxafyMl0hj8WcU8DwIW335GLhD wjc2cSsY+uxnkBUEWQ66pnE4lzm3qKXH7kVKRgv++onQH+VVk2MON9oXh6lyghtZ0t8N LN4ZCoyLzb/Z5yiyKs7W2+Wmqnfrnn56HYO/c= Received: by 10.213.35.202 with SMTP id q10mr878311ebd.64.1310195873175; Sat, 09 Jul 2011 00:17:53 -0700 (PDT) MIME-Version: 1.0 Received: by 10.213.19.80 with HTTP; Sat, 9 Jul 2011 00:17:33 -0700 (PDT) In-Reply-To: References: From: Yan Chunlu Date: Sat, 9 Jul 2011 15:17:33 +0800 Message-ID: Subject: Re: how large cassandra could scale when it need to do manual operation? To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=0015174c43e09b599104a79dbf0d X-Virus-Checked: Checked by ClamAV on apache.org --0015174c43e09b599104a79dbf0d Content-Type: text/plain; charset=ISO-8859-1 thank you very much for the reply. which brings me more confidence on cassandra. I will try the automation tools, the examples you've listed seems quite promising! about the decommission problem, here is the link: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-decommission-two-slow-nodes-td5078455.html I am also trying to deploy cassandra across two datacenters(with 20ms latency). so I am worrying about the network latency will even make it worse. maybe I was misunderstanding the replication factor, doesn't it RF=3 means I could lose two nodes and still have one available(with 100% of the keys), once Nodes>=3? besides I am not sure what's twitters setting on RF, but it is possible to lose 3 nodes in the same time(facebook once encountered photo loss because there RAID broken, rarely happen though). I have the strong willing to set RF to a very high value... Thanks! On Sat, Jul 9, 2011 at 5:22 AM, aaron morton wrote: > AFAIK Facebook Cassandra and Apache Cassandra diverged paths a long time > ago. Twitter is a vocal supporter with a large Apache Cassandra install, > e.g. "Twitter currently runs a couple hundred Cassandra nodes across a half > dozen clusters. " > http://www.datastax.com/2011/06/chris-goffinet-of-twitter-to-speak-at-cassandra-sf-2011 > > > > If > you are working with a 3 node cluster removing/rebuilding/what ever one node > will effect 33% of your capacity. When you scale up the contribution from > each individual node goes down, and the impact of one node going down is > less. Problems that happen with a few nodes will go away at scale, to be > replaced by a whole set of new ones. > > > 1): the load balance need to manually performed on every node, according > to: > > Yes > > 2): when adding new nodes, need to perform node repair and cleanup on every > node > > You only need to run cleanup, see > http://wiki.apache.org/cassandra/Operations#Bootstrap > > 3) when decommission a node, there is a chance that slow down the entire > cluster. (not sure why but I saw people ask around about it.) and the only > way to do is shutdown the entire the cluster, rsync the data, and start all > nodes without the decommission one. > > I cannot remember any specific cases where decommission requires a full > cluster stop, do you have a link? With regard to slowing down, the > decommission process will stream data from the node you are removing onto > the other nodes this can slow down the target node (I think it's more > intelligent now about what is moved). This will be exaggerated in a 3 node > cluster as you are removing 33% of the processing and adding some > (temporary) extra load to the remaining nodes. > > after all, I think there is alot of human work to do to maintain the > cluster which make it impossible to scale to thousands of nodes, > > Automation, Automation, Automation is the only way to go. > > Chef, Puppet, CF Engine for general config and deployment; Cloud Kick, > munin, ganglia etc for monitoring. And > Ops Centre (http://www.datastax.com/products/opscenter) for cassandra > specific management. > > I am totally wrong about all of this, currently I am serving 1 millions pv > every day with Cassandra and it make me feel unsafe, I am afraid one day one > node crash will cause the data broken and all cluster goes wrong.... > > With RF3 and a 3Node cluster you have room to lose one node and the cluster > will be up for 100% of the keys. While better than having to worry about > *the* database server, it's still entry level fault tolerance. With RF 3 in > a 6 Node cluster you can lose up to 2 nodes and still be up for 100% of the > keys. > > Is there something you are specifically concerned about with your current > installation ? > > Cheers > > ----------------- > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com > > On 8 Jul 2011, at 08:50, Yan Chunlu wrote: > > hi, all: > I am curious about how large that Cassandra can scale? > > from the information I can get, the largest usage is at facebook, which is > about 150 nodes. in the mean time they are using 2000+ nodes with Hadoop, > and yahoo even using 4000 nodes of Hadoop. > > I am not understand why is the situation, I only have little knowledge > with Cassandra and even no knowledge with Hadoop. > > > > currently I am using cassandra with 3 nodes and having problem bring one > back after it out of sync, the problems I encountered making me worry about > how cassandra could scale out: > > 1): the load balance need to manually performed on every node, according > to: > > def tokens(nodes): > > for x in xrange(nodes): > > print 2 ** 127 / nodes * x > > > > 2): when adding new nodes, need to perform node repair and cleanup on every > node > > > > 3) when decommission a node, there is a chance that slow down the entire > cluster. (not sure why but I saw people ask around about it.) and the only > way to do is shutdown the entire the cluster, rsync the data, and start all > nodes without the decommission one. > > > > > > after all, I think there is alot of human work to do to maintain the > cluster which make it impossible to scale to thousands of nodes, but I hope > I am totally wrong about all of this, currently I am serving 1 millions pv > every day with Cassandra and it make me feel unsafe, I am afraid one day one > node crash will cause the data broken and all cluster goes wrong.... > > > > in the contrary, relational database make me feel safety but it does not > scale well. > > > > thanks for any guidance here. > > > -- Charles --0015174c43e09b599104a79dbf0d Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable thank you very much for the reply. which brings me more confidence on cassa= ndra.
I will try the automation tools, the examples you've lis= ted seems quite promising!


=A0I am also trying to deploy cassandra across two datacenters(with 20= ms latency). so I am worrying about the network latency will even make it w= orse. =A0

maybe I was misunderstanding the replica= tion factor, doesn't it RF=3D3 means I could lose two nodes and still h= ave one available(with 100% of the keys), once Nodes>=3D3? =A0 besides I= am not sure what's twitters setting on RF, but it is possible to lose = 3 nodes in the same time(facebook once encountered photo loss because there= RAID broken, rarely happen though). I have the strong willing to set RF to= a very high value...

Thanks!


On Sat, Jul 9, 2011 at 5:22 AM, aaron morton <= ;aaron@thelast= pickle.com> wrote:
AFAIK Fa= cebook Cassandra and Apache Cassandra diverged paths a long time ago. Twitt= er is a vocal supporter with a large Apache Cassandra install, e.g. "T= witter=A0currently runs a couple hundred Cassandra nodes across a half doze= n clusters.=A0"=A0http://ww= w.datastax.com/2011/06/chris-goffinet-of-twitter-to-speak-at-cassandra-sf-2= 011


If you are working with a 3 node cluster removing/rebuilding/what ever on= e node will effect 33% of your capacity. When you scale up the contribution= from each individual node goes down, and the impact of one node going down= is less. Problems that happen with a few nodes will go away at scale, to b= e replaced by a whole set of new ones. =A0=A0


1): =A0the load balan= ce need to manually performed on every node, according to:=A0
Yes
2): when adding new nodes, need= to perform node repair and cleanup on every node=A0

3) when decommission= a node, there is a chance that slow down the entire cluster. (not sure why= but I saw people ask around about it.) and the only way to do is shutdown = the entire the cluster, rsync the data, and start all nodes without the dec= ommission one.=A0
I= cannot remember any specific cases where decommission requires a full clus= ter stop, do you have a link? With regard to slowing down, the decommission= process will stream data from the node you are removing onto the other nod= es this can slow down the target node (I think it's more intelligent no= w about what is moved). This will be exaggerated in a 3 node cluster as you= are removing 33% of the processing and adding some (temporary) extra load = to the remaining nodes.

after all, I think there is alot of human work to do to mainta= in the cluster which make it impossible to scale to thousands of nodes,=A0<= /span>
Automation, Automation, Automation is the only way = to go.

Chef, Puppet, CF Engine for general config= and deployment; Cloud Kick, munin, ganglia etc for monitoring. And
Ops Centre (http://www.datastax.com= /products/opscenter) for cassandra specific management.

I am totally wrong about all of this, currently I am serving 1 mill= ions pv every day with Cassandra and it make me feel unsafe, I am afraid on= e day one node crash will cause the data broken and all cluster goes wrong.= ...
With RF3 and a 3Node cluster you have room to lose = one node and the cluster will be up for 100% of the keys. While better than= having to worry about *the* database server, it's still entry level fa= ult tolerance. With RF 3 in a 6 Node cluster you can lose up to 2 nodes and= still be up for 100% of the keys.

Is there something you are specifically concerned about= with your current installation ?

Cheers

-----------------
Aaron Morton
Freelance Cass= andra Developer
@aaronmorton

On 8 Jul 2011, at 08:50, Yan Chunlu wrote:

hi, all:
I am curious about how large th= at Cassandra can scale?=A0
from the information I can get, the largest usage is a= t facebook, which is about 150 nodes. =A0in the mean time they are using 20= 00+ nodes with Hadoop, and yahoo even using 4000 nodes of Hadoop.=A0=

I am not understand why is the situation, I only have = =A0little knowledge with Cassandra and even no knowledge with Hadoop.=A0



currently I am using cassandra with 3 nodes and having= problem bring one back after it out of sync, the problems I encountered ma= king me worry about how cassandra could scale out:=A0

1): =A0the load balance need to manually performed on = every node, according to:=A0<= br>
def tokens(nodes):=A0

for x in xrange(nodes):=A0

print 2 ** 127 / nodes * x=A0



2): when adding new nodes, need to perform node repair= and cleanup on every node=A0=



3) when decommission a node, there is a chance that sl= ow down the entire cluster. (not sure why but I saw people ask around about= it.) and the only way to do is shutdown the entire the cluster, rsync the = data, and start all nodes without the decommission one.=A0





after all, I think there is alot of human work to do t= o maintain the cluster which make it impossible to scale to thousands of no= des, but I hope I am totally wrong about all of this, currently I am servin= g 1 millions pv every day with Cassandra and it make me feel unsafe, I am a= fraid one day one node crash will cause the data broken and all cluster goe= s wrong....=A0



in the contrary, relational database make me feel safe= ty but it does not scale well.=A0



thanks for any guidance here.




--
Charles --0015174c43e09b599104a79dbf0d--