cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nair, Rajesh" <>
Subject RE: Cassandra 2 DC deployment
Date Wed, 13 Apr 2011 14:30:20 GMT
Peter all great questions. Let me try to answer them.

You are right about the automatic fallback to ONE. Its quite possible, if 2 nodes die for
some reason I will have the same problem. So probably the right thing to do would be to read/write
at ONE only when we lose a DC by changing some manual configuration. Since we shouldn't be
losing DCs that often, this should be an acceptable change. So my follow up questions would
be -
When would be the right time to start reading/writing at QUORUM again? 
Should we be marking the 2 nodes in the lost DC as down?
Should we be doing some administrative work on Cassandra before we start reading/writing at
QUORUM again?

I am trying to define a process when we lose a dc. 


-----Original Message-----
From: [] On Behalf Of Peter Schuller
Sent: Tuesday, April 12, 2011 4:54 PM
Subject: Re: Cassandra 2 DC deployment

> When the down data center comes back up, the Quorum reads will result in a read-repair,
so you will get valid data.   Besides that, hinted handoff will take care of getting data
replicated to a previously down node.

*Eventually* though, but yes. I.e., there would be no expectation to instantly go back to
full consistency once it goes back up.

Also, I would argue that it's useful to consider this: If you're implementing "automatic"
fallback to ONE whenever QUORUM fails; consider all cases where this might happen for reasons
*other* than there being a legitimate partition of the DC:s. For example, some random networking
issues causing fewer nodes to be up etc.

A valid question is: If you simply do automatic fallback whenever QUORUM fails anyway, are
you significantly increasing consistency with respect to ONE anyway? In some cases yes, but
just be sure you know what you're doing... Keep in mind that when all nodes are up and all
is working well, CL.ONE doesn't mean that writes won't be replicated to all nodes. It just
means that only one is *required* - and same for reads.

If you have some situation whereby you normally want the strict requirement that a read subsequent
to a write sees the written data, that doesn't sound very compatible with automatically falling
back to CL.ONE...

Anyways, those are my off-the-cuff thoughts - maybe it doesn't apply in the situation in question.
/ Peter Schuller

this message was misdirected, BlackRock, Inc. and its subsidiaries, ("BlackRock") does not
waive any confidentiality or privilege.  If you are not the intended recipient, please notify
us immediately and destroy the message without disclosing its contents to anyone.  Any distribution,
use or copying of this e-mail or the information it contains by other than an intended recipient
is unauthorized.  The views and opinions expressed in this e-mail message are the author's
own and may not reflect the views and opinions of BlackRock, unless the author is authorized
by BlackRock to express such views or opinions on its behalf.  All email sent to or from this
address is subject to electronic storage and review by BlackRock.  Although BlackRock operates
anti-virus programs, it does not accept responsibility for any damage whatsoever caused by
viruses being passed.

View raw message