Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of scode@scode.org designates
 74.125.82.44 as permitted sender)
MIME-Version: 1.0
Sender: scode@scode.org
In-Reply-To: <4EAA76DF.5030405@sitevision.se>
References: <4EAA76DF.5030405@sitevision.se>
Date: Fri, 28 Oct 2011 11:45:54 +0200
Message-ID: 
 <CAO5xsd0esonvFzCbTstykCw9DK6RB4KK07Q=5-VO+mTEn7jn5A@mail.gmail.com>
Subject: Re: CL - locally consistent ONE
From: Peter Schuller <peter.schuller@infidyne.com>
To: user@cassandra.apache.org
Content-Type: text/plain; charset=UTF-8

> I've patched the classes WriteResponseHandler and ReadCallback to make sure
> that the local node has returned before sending the condition signal. Can
> anyone see any drawbacks with this approach? I realize this will only work
> as long as the replication factor is the same as the number of nodes, but
> that is ok for our scenario.

So the "local" node is the co-ordinator. Is it the case that each CMS
instance (with embedded cassandra) is always using "itself" as the
co-ordinator, and that the requirement you have is that *that
particular CMS instance* must see it's *own* writes? And the reason
you are using RF=number of nodes is that you're wanting to make sure
data is always on the local node?

If that is true, it *seems* to me it should work *kind of* as long as
the CMS instances never ever use another Cassandra node *and* as long
as you accept that a write may disappear in case of a sudden node
failure (as usual with CL.ONE). I do think it feels like a fragile
approach though, that would be nice to avoid if possible/realistic.

I am curious as to performance though. It seems a lot safer to just
use QUORUM *at least* for writes; keep in mind that regardless of
CL.ONE your writes till go to all the other replicas (in this case all
nodes since you say you have RF = cluster size) so in terms of
throughput using CL.ONE should not be faster. It should be a bit
better for latency in the common case though (which might translate to
throughput for a sequential user). If you can do writes on QUORUM at
least, even if not reads, you also avoid the problem of an
acknowledged write disappearing in case of node failures.

Are you in a position where the nodes in the cluster are wide apart
(e.g. different DC:s), for the writes to be a significant problem for
latency?

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)