On Sep 18, 2009, at 9:55 PM, Jonathan Ellis wrote:

On Fri, Sep 18, 2009 at 9:09 PM, Jonathan Mischo <jmischo@quagility.com> wrote:
        Multiple data center replication in the background. maybe a
multi master type thing

It already has this. It was built from the ground up for this. It's highly
tolerant to partitioning and has always available writes. All replication is
done in the background (unless you specifically set a write to a high
consistency level).

You know, it does and it doesn't.  RackAwareStrategy isn't a true N+1
scaling solution. Currently, RackAwareStrategy only guarantees that it will
try to replicate data to one other data center and/or one other rack,
depending on the number of replicas specified.

Yes; that's what it's supposed to do, and it's satisfying a very real
use case: "I want my data's primary data center to be DC A, but I want
one replica in DC B in case A is completely unavailable."

Other use cases can use different Strategies.  That's why they're
pluggable.  It's not one-size-fits-all and it's not supposed to be.

Yeah, you're right, if N+1 is a concern, it should probably be a separate strategy, unless we can keep the complexity virtually the same, because of how heavily it's called. RackAwareStrategy is perfectly fine for what it does - guarantee a replica in a different DC and/or a replica in a different rack after that, if you configure it to store more than 1 replica. Above 3 replicas, it can start to get unbalanced, though, since it's just iterating through the node list, which really has no value.  We could probably just document that for RackAwareStrategy.

I know we're trying to solve for the biggest wins for effort, but, as the Cassandra user base grows (and it will, because it fills a niche that no other KVS or RDBMS quite fills), I think N+1 capability is going to be something that will need to be solved for fairly soon for widespread adoption.

-Jon