From cassandra-user-return-633-apmail-incubator-cassandra-user-archive=incubator.apache.org@incubator.apache.org Sat Sep 19 20:11:23 2009 Return-Path: Delivered-To: apmail-incubator-cassandra-user-archive@minotaur.apache.org Received: (qmail 63124 invoked from network); 19 Sep 2009 20:11:23 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 19 Sep 2009 20:11:23 -0000 Received: (qmail 60131 invoked by uid 500); 19 Sep 2009 20:11:23 -0000 Delivered-To: apmail-incubator-cassandra-user-archive@incubator.apache.org Received: (qmail 60074 invoked by uid 500); 19 Sep 2009 20:11:22 -0000 Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cassandra-user@incubator.apache.org Delivered-To: mailing list cassandra-user@incubator.apache.org Received: (qmail 60065 invoked by uid 99); 19 Sep 2009 20:11:22 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 19 Sep 2009 20:11:22 +0000 X-ASF-Spam-Status: No, hits=3.4 required=10.0 tests=FS_REPLICA,HTML_MESSAGE,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jmischo@quagility.com designates 216.154.210.211 as permitted sender) Received: from [216.154.210.211] (HELO quagility.com) (216.154.210.211) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 19 Sep 2009 20:11:11 +0000 Received: from [10.0.69.97] (adsl-69-210-247-168.dsl.chcgil.ameritech.net [69.210.247.168]) (authenticated bits=0) by quagility.com (8.13.1/8.13.1) with ESMTP id n8JKAn7J015586 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO) for ; Sat, 19 Sep 2009 15:10:50 -0500 Message-Id: From: Jonathan Mischo To: cassandra-user@incubator.apache.org In-Reply-To: Content-Type: multipart/alternative; boundary=Apple-Mail-103-205942291 Mime-Version: 1.0 (Apple Message framework v935.3) Subject: Replication Strategies WAS: New Features - Future releases Date: Sat, 19 Sep 2009 15:10:49 -0500 References: <9b40bc2a0909181650n541e3c53p5b8b56bd397c5a9f@mail.gmail.com> <9aa5ef84c3b75a9cab00732dbfef23d3@mail> X-Mailer: Apple Mail (2.935.3) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail-103-205942291 Content-Type: text/plain; charset=WINDOWS-1252; format=flowed; delsp=yes Content-Transfer-Encoding: quoted-printable On Sep 18, 2009, at 9:55 PM, Jonathan Ellis wrote: > On Fri, Sep 18, 2009 at 9:09 PM, Jonathan Mischo = > wrote: >>>> =95 Multiple data center replication in the background. =20 >>>> maybe a >>>> multi master type thing >>> >>> It already has this. It was built from the ground up for this. =20 >>> It's highly >>> tolerant to partitioning and has always available writes. All =20 >>> replication is >>> done in the background (unless you specifically set a write to a =20 >>> high >>> consistency level). >> >> You know, it does and it doesn't. RackAwareStrategy isn't a true N+1 >> scaling solution. Currently, RackAwareStrategy only guarantees that =20= >> it will >> try to replicate data to one other data center and/or one other rack, >> depending on the number of replicas specified. > > Yes; that's what it's supposed to do, and it's satisfying a very real > use case: "I want my data's primary data center to be DC A, but I want > one replica in DC B in case A is completely unavailable." > > Other use cases can use different Strategies. That's why they're > pluggable. It's not one-size-fits-all and it's not supposed to be. Yeah, you're right, if N+1 is a concern, it should probably be a =20 separate strategy, unless we can keep the complexity virtually the =20 same, because of how heavily it's called. RackAwareStrategy is =20 perfectly fine for what it does - guarantee a replica in a different =20 DC and/or a replica in a different rack after that, if you configure =20 it to store more than 1 replica. Above 3 replicas, it can start to get =20= unbalanced, though, since it's just iterating through the node list, =20 which really has no value. We could probably just document that for =20 RackAwareStrategy. I know we're trying to solve for the biggest wins for effort, but, as =20= the Cassandra user base grows (and it will, because it fills a niche =20 that no other KVS or RDBMS quite fills), I think N+1 capability is =20 going to be something that will need to be solved for fairly soon for =20= widespread adoption. -Jon --Apple-Mail-103-205942291 Content-Type: text/html; charset=WINDOWS-1252 Content-Transfer-Encoding: quoted-printable
On Sep 18, 2009, = at 9:55 PM, Jonathan Ellis wrote:

On = Fri, Sep 18, 2009 at 9:09 PM, Jonathan Mischo <jmischo@quagility.com> = wrote:
       =95 Multiple data center = replication in the background. maybe = a
multi = master type thing

It already has this. It was = built from the ground up for this. It's = highly
tolerant to partitioning and has always available writes. = All replication is
done in the background (unless = you specifically set a write to a = high
consistency = level).

You know, it = does and it doesn't.  RackAwareStrategy isn't a true = N+1
scaling solution. = Currently, RackAwareStrategy only guarantees that it = will
try to replicate data to = one other data center and/or one other rack,
depending on the number of replicas = specified.

Yes; that's what it's supposed to do, and = it's satisfying a very real
use case: "I want my data's primary data = center to be DC A, but I want
one replica in DC B in case A is = completely unavailable."

Other use cases can use different = Strategies.  That's why they're
pluggable.  It's not = one-size-fits-all and it's not supposed to = be.

Yeah, you're right, if N+1 is a = concern, it should probably be a separate strategy, unless we can keep = the complexity virtually the same, because of how heavily it's called. = RackAwareStrategy is perfectly fine for what it does - guarantee a = replica in a different DC and/or a replica in a different rack after = that, if you configure it to store more than 1 replica. Above 3 = replicas, it can start to get unbalanced, though, since it's just = iterating through the node list, which really has no value.  We = could probably just document that for RackAwareStrategy.

I know = we're trying to solve for the biggest wins for effort, but, as the = Cassandra user base grows (and it will, because it fills a niche that no = other KVS or RDBMS quite fills), I think N+1 capability is going to be = something that will need to be solved for fairly soon for widespread = adoption.

-Jon

= --Apple-Mail-103-205942291--