cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sylvain Lebresne <sylv...@yakaz.com>
Subject Re: Consitency level ONE
Date Tue, 29 Jun 2010 11:03:47 GMT
> Hi all,
>
> I'm having some issues with read consistency level ONE. The Wiki (and other sources)
say the following:
>
> Will return the record returned by the first node to respond. A consistency check is
always done in a background thread to fix any consistency issues when ConsistencyLevel.ONE
is used. This means subsequent calls will have correct data even if the initial read gets
an older value. (This is called read repair.)
>
> However, when looking at the code, it seems that the read is only directed towards the
first node that is suitable (and alive). This means that a slow node will cause slow responses
even though my replication factor is > 1. I would expect the read to go to all the suitable
nodes and as soon as one of those nodes responds, the reply is used (just as the documentation
says).
>
> Moving to Quorum reads would solve part of this problem, but with one server down and
1 slow one, I'm back to square one.

This would not solve part of the problem.

When you do a QUORUM read, the value(s) of asked column(s) are not requested
from each replica. Instead, the value is asked to one node and only a digest
of the value is asked to the other nodes. This is done to avoid too much
inter-cluster transfer (and thus save bandwidth, and thus make it more
efficient) as in normal condition you expect all value to be exactly the same
and thus transferring all those data would be wasteful. If ever the value and
the digest doesn't match, then only are the actual value requested.

Same thing for CL.ONE. The background consistency check only really ask for
digests, which save a lot of internal bandwidth.

Now, back to the slow node problem. The code already do it's best to ask the
best suited node. First by retrieving the data locally if possible, then using
the EndpointSnitch that you can configure to tell Cassandra what is this best
suited node.
There is the problem of slow node because of temporary problem, either network
problem or because this node is too loaded and cannot keep-up. But Cassandra
choose to optimize for the normal case rather than the error case, which I
believe is the right choice.

--
Sylvain

>
> Greetings,
>
> Wouter
>

Mime
View raw message