cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Bulkowski <>
Subject eventual consistency question
Date Tue, 13 Oct 2009 21:52:31 GMT

I'm evaluating Cassandra, like others. I've scrubbed through the mail 
digest and blog posts and whatnot, and I've seen my question asked but 
I'm not clear on the answers.

I'm doing what others have done: using 3 servers and doing a few test 
inserts to understand the data and consistency model.

Question 1:
    the bootstrap parameter: what does it do, exactly?
    It seems the right thing to do, just playing around, is to start the 
first node with no bootstrap, and the other two with bootstrap.
    But I don't know the hows or whys.

Question 2:
    "how eventual is eventual?"
    Imagine the following case:
       Defaults from storage-conf.xml + replication count 2 (and the IP 
addresses required, etc)
       Up server A (no -b)
       Insert a few values, read, all is good (using _cli)
       Up server B, C (with -b)
       read values from A, B, or C - all is good, appears to be reading 
from A
       wait a few minutes - servers appear quiescent.
       Down server A
       read values from B - values are not available (NPE exception on 
server & _cli interface)

So I read that Cassandra doesn't optimistically replicate, so I 
understand in theory that the data inserted to A shouldn't replicate.
I believe if I used the proper thrift inteface and asked for replication 
count 2, the transaction would have failed.
Yet, I expect that if I asked for replication count 2, I should get it. 
At some point. Eventually. The data has been inserted.
I expect the cluster to work toward replication count 2 regardless of 
the current state of the cluster --- is there a way to achieve this 

Question 3:
       This question is similar to question 2, from a different way.
       I have three nodes which I brought up at the dawn of time. 
They've taken a lot of inserts, and have 1T each.
       Let's say the load now is mostly reads, as the data has already 
been inserted
       I bring up a fourth node.
       Clients (aka app servers) are pointing at the first 3 nodes. I 
have to reconfigure those servers to start using the 4th server, right?
       New writes may take advantage of the 4th server, but no data will 
automatically move?
       Which would mean that the servers would be out of balance, 
perhaps for a long time, perhaps forever?

Thanks for the hints - I'm clearly not "getting" Cassandra yet and don't 
want to foolishly misrepresent it.


View raw message