Return-Path: Delivered-To: apmail-incubator-cassandra-user-archive@minotaur.apache.org Received: (qmail 80242 invoked from network); 13 Oct 2009 21:53:19 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 13 Oct 2009 21:53:19 -0000 Received: (qmail 65546 invoked by uid 500); 13 Oct 2009 21:53:19 -0000 Delivered-To: apmail-incubator-cassandra-user-archive@incubator.apache.org Received: (qmail 65502 invoked by uid 500); 13 Oct 2009 21:53:18 -0000 Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cassandra-user@incubator.apache.org Delivered-To: mailing list cassandra-user@incubator.apache.org Received: (qmail 65491 invoked by uid 99); 13 Oct 2009 21:53:18 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Oct 2009 21:53:18 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of brian@bulkowski.org designates 69.56.148.20 as permitted sender) Received: from [69.56.148.20] (HELO gateway10.websitewelcome.com) (69.56.148.20) by apache.org (qpsmtpd/0.29) with SMTP; Tue, 13 Oct 2009 21:53:06 +0000 Received: (qmail 6399 invoked from network); 13 Oct 2009 22:03:56 -0000 Received: from gator912.hostgator.com (174.120.63.2) by gateway10.websitewelcome.com with SMTP; 13 Oct 2009 22:03:56 -0000 Received: from c-67-161-11-64.hsd1.ca.comcast.net ([67.161.11.64]:50728 helo=[192.168.4.32]) by gator912.hostgator.com with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.69) (envelope-from ) id 1MxpIP-0004Kr-Hz for cassandra-user@incubator.apache.org; Tue, 13 Oct 2009 16:52:41 -0500 Message-ID: <4AD4F69F.5020802@bulkowski.org> Date: Tue, 13 Oct 2009 14:52:31 -0700 From: Brian Bulkowski User-Agent: Thunderbird 2.0.0.23 (Windows/20090812) MIME-Version: 1.0 To: cassandra-user@incubator.apache.org Subject: eventual consistency question Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - gator912.hostgator.com X-AntiAbuse: Original Domain - incubator.apache.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - bulkowski.org X-Virus-Checked: Checked by ClamAV on apache.org Greetings, I'm evaluating Cassandra, like others. I've scrubbed through the mail digest and blog posts and whatnot, and I've seen my question asked but I'm not clear on the answers. I'm doing what others have done: using 3 servers and doing a few test inserts to understand the data and consistency model. Question 1: the bootstrap parameter: what does it do, exactly? It seems the right thing to do, just playing around, is to start the first node with no bootstrap, and the other two with bootstrap. But I don't know the hows or whys. Question 2: "how eventual is eventual?" Imagine the following case: Defaults from storage-conf.xml + replication count 2 (and the IP addresses required, etc) Up server A (no -b) Insert a few values, read, all is good (using _cli) Up server B, C (with -b) read values from A, B, or C - all is good, appears to be reading from A wait a few minutes - servers appear quiescent. Down server A read values from B - values are not available (NPE exception on server & _cli interface) So I read that Cassandra doesn't optimistically replicate, so I understand in theory that the data inserted to A shouldn't replicate. I believe if I used the proper thrift inteface and asked for replication count 2, the transaction would have failed. Yet, I expect that if I asked for replication count 2, I should get it. At some point. Eventually. The data has been inserted. I expect the cluster to work toward replication count 2 regardless of the current state of the cluster --- is there a way to achieve this behavior? Question 3: "balancing" This question is similar to question 2, from a different way. I have three nodes which I brought up at the dawn of time. They've taken a lot of inserts, and have 1T each. Let's say the load now is mostly reads, as the data has already been inserted I bring up a fourth node. Clients (aka app servers) are pointing at the first 3 nodes. I have to reconfigure those servers to start using the 4th server, right? New writes may take advantage of the 4th server, but no data will automatically move? Which would mean that the servers would be out of balance, perhaps for a long time, perhaps forever? Thanks for the hints - I'm clearly not "getting" Cassandra yet and don't want to foolishly misrepresent it. Thanks, -brianb