From cassandra-user-return-1143-apmail-incubator-cassandra-user-archive=incubator.apache.org@incubator.apache.org Fri Nov 06 21:47:05 2009 Return-Path: Delivered-To: apmail-incubator-cassandra-user-archive@minotaur.apache.org Received: (qmail 24859 invoked from network); 6 Nov 2009 21:47:05 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 6 Nov 2009 21:47:05 -0000 Received: (qmail 28735 invoked by uid 500); 6 Nov 2009 21:47:05 -0000 Delivered-To: apmail-incubator-cassandra-user-archive@incubator.apache.org Received: (qmail 28694 invoked by uid 500); 6 Nov 2009 21:47:05 -0000 Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cassandra-user@incubator.apache.org Delivered-To: mailing list cassandra-user@incubator.apache.org Received: (qmail 28685 invoked by uid 99); 6 Nov 2009 21:47:05 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Nov 2009 21:47:05 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [209.85.219.209] (HELO mail-ew0-f209.google.com) (209.85.219.209) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Nov 2009 21:46:55 +0000 Received: by ewy5 with SMTP id 5so1442640ewy.12 for ; Fri, 06 Nov 2009 13:46:34 -0800 (PST) Received: by 10.216.87.7 with SMTP id x7mr1532543wee.53.1257543994281; Fri, 06 Nov 2009 13:46:34 -0800 (PST) Received: from ?10.0.1.14? (c-67-176-95-143.hsd1.co.comcast.net [67.176.95.143]) by mx.google.com with ESMTPS id i6sm1014758gve.17.2009.11.06.13.46.31 (version=TLSv1/SSLv3 cipher=RC4-MD5); Fri, 06 Nov 2009 13:46:33 -0800 (PST) From: Joe Stump Mime-Version: 1.0 (Apple Message framework v1076) Content-Type: multipart/alternative; boundary=Apple-Mail-55-63915571 Subject: Re: using cassandra as a real time DW Date: Fri, 6 Nov 2009 14:46:29 -0700 In-Reply-To: To: cassandra-user@incubator.apache.org References: <7A8AF337-F86C-498D-B098-10F9AE9BBEE7@Holsman.net> <84FD30BC-4F8C-4FA4-AD4B-22FCE87EB67B@joestump.net> Message-Id: <08786093-5165-4812-8162-8F70459E18FD@joestump.net> X-Mailer: Apple Mail (2.1076) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail-55-63915571 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes On Nov 6, 2009, at 2:35 PM, Mark Robson wrote: > 2009/11/6 Joe Stump > > Can you explain what you mean by lack of load balancing? > > > Nothing in Cassandra attempts to ensure that your data are equally > spread over the different nodes (yet; there are several bugs open to > this effect). That's not true from my understanding. It won't put three copies on the same node. The key word, I suppose, is "equally". > If you use the OrderedPartitioner, in all likelihood your data will > be very unevenly spread to the point where most of your servers > aren't used at all. This obviously doesn't scale. > > The RandomPartitioner is better because the hashing it does causes > data to spread out, but the tokens are still chosen randomly so > there's no way to guarantee that machines get equal or even similar > (ish) amounts of data. We've answered this by creating our own partitioners, which Cassandra makes pluggable. Took one of our guys about two full days to have something up and running. Also, there's no way to guarantee anything for the most part in distributed computing. I think you're misleading people, though, with the notion that a. Cassandra doesn't have load balancing (it does, in many ways) and b. It doesn't scale. Digg and Facebook both use it in production and while it might not be battle hardened and fully tested, it's definitely working for them well under high load. --Joe --Apple-Mail-55-63915571 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii
2009/11/6 Joe Stump <joe@joestump.net>

Can you explain what you mean by lack of load = balancing?


Nothing in Cassandra attempts to = ensure that your data are equally spread over the different nodes (yet; = there are several bugs open to this = effect).

That's not true = from my understanding. It won't put three copies on the same node. The = key word, I suppose, is "equally". 

If you use the = OrderedPartitioner, in all likelihood your data will be very unevenly = spread to the point where most of your servers aren't used at all. This = obviously doesn't scale.

The RandomPartitioner = is better because the hashing it does causes data to spread out, but the = tokens are still chosen randomly so there's no way to guarantee that = machines get equal or even similar(ish) amounts of = data.

We've answered = this by creating our own partitioners, which Cassandra makes pluggable. = Took one of our guys about two full days to have something up and = running. Also, there's no way to guarantee anything for the most part in = distributed computing.

I think you're = misleading people, though, with the notion that a. Cassandra doesn't = have load balancing (it does, in many ways) and b. It doesn't scale. = Digg and Facebook both use it in production and while it might not be = battle hardened and fully tested, it's definitely working for them well = under high = load.

--Joe

= --Apple-Mail-55-63915571--