incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vasileios Vlachos <vasileiosvlac...@gmail.com>
Subject Re: Replication Factor and Consistency Level Confusion
Date Thu, 20 Dec 2012 09:26:25 GMT
Hello,

Thank you very much for your quick responses.

Initially we were thinking the same thing, that an explanation would
be that the "wrong" node could be down, but then isn't this something
that hinted handoff sorts out? So actually, Consistency Level refers
to the number of replicas, not the total number of nodes in a cluster.
Keeping that in mind and assuming that hinted handoff has nothing to
do with that as I thought, I could explain some results but not all.
Let me explain:

Test 1 (3/3 Nodes UP):
CL  :    ANY     ONE    TWO    THREE    QUORUM   ALL
RF 3:    OK      OK     OK     OK       OK       OK

Test 2 (2/3 Nodes UP):
CL  :    ANY    ONE    TWO    THREE    QUORUM    ALL
RF 2:    OK     OK     x      x        OK        x

Test 3 (2/3 Nodes UP):
CL  :    ANY    ONE    TWO    THREE    QUORUM    ALL
RF 3:    OK     OK     x      x        OK        OK

Test 1:
Everything was fine because all nodes were up and the RF does not
exceed the total number of nodes, in which case writes would be
blocked.

Test 2:
CL=TWO did not work because we were "unlucky" and the "wrong" node,
responsible for the key range we were trying to insert, was DOWN (I
can accept that for now, however I do not quite understand why isn't
this sorted by the hinted handoff). My explanation might be wrong
again, but CL=THREE should fail because we only have set RF=2, so
there isn't a 3rd replica anywhere anyway. Why did CL=QUORUM not fail
then? Since QUORUM=(RF/2)+1=2 in this case, the write operation should
try to write in 2 replicas, one of which, the one responsible for that
range as we said, is DOWN. I should expect CL=2 and CL=QUORUM to have
the same outcome in this case. Why that's not the case? CL=ALL fails
for the same reason as CL=TWO I presume.

Test 3:
I was expecting only CL=ANY and CL=ONE to work in this case. CL=TWO
does not work because , just like with Test 2, the same situation
applies with the node responsible for that particular key range to be
DOWN. If that's the case, why CL=QUORUM was successful??? The only
explanation I can thing of at the moment is that QUORUM explicitly
refers to the total number of nodes in the cluster rather than the
number of replicas determined by the RF. CL=THREE seems easy, it fails
because one of the three replicas is DOWN. CL=ALL is confusing as
well. If my understanding is correct and ALL means all replicas, 3 in
this case, then the operation should fail because one replica is DOWN
and I can not be "lucky" to have the right node DOWN, because RF=3.
So, every node should have a copy of the data.

Furthermore, with regards to being "unlucky" with the "wrong node" if
this actually what is happening, how is it possible to ever have a
node-failure resiliant cassandra cluster? My understanding of this
implies that even with 100 nodes, every 1/100 writes would fail until
the node is replaced/repaired.

Thank you very much in advance.

Vasilis

On Wed, Dec 19, 2012 at 4:18 PM, Roland Gude <roland.gude@ez.no> wrote:
>
> Hi
>
> RF 2 means that 2 nodes are responsible for any given row (no matter how
> many nodes are in the cluster)
> For your cluster with three nodes let's just assume the following
> responsibilities
>
> Node            A               B               C
> Primary keys    0-5             6-10            11-15
> Replica keys    11-15           0-5             6-10
>
> Assume node 'C' is down
> Writing any key in range 0-5 with consistency TWO is possible (A and B are
> up)
> Writing any key in range 11-15 with consistency TWO will fail (C is down
> and 11-15 is its primary range)
> Writing any key in range 6-10 with consistency TWO will fail (C is down
> and it is the replica for this range)
>
> I hope this explains it.
>
> -----Urspr√ľngliche Nachricht-----
> Von: Vasileios Vlachos [mailto:vasileiosvlachos@gmail.com]
> Gesendet: Mittwoch, 19. Dezember 2012 17:07
> An: user@cassandra.apache.org
> Betreff: Replication Factor and Consistency Level Confusion
>
> Hello All,
>
> We have a 3-node cluster and we created a keyspace (say Test_1) with
> Replication Factor set to 3. I know is not great but we wanted to test
> different behaviors. So, we created a Column Family (say cf_1) and we tried
> writing something with Consistency Level ANY, ONE, TWO, THREE, QUORUM and
> ALL. We did that while all nodes were in UP state, so we had no problems at
> all. No matter what the Consistency Level was, we were able to insert a
> value.
>
> Same cluster, different keyspace (say Test_2) with Replication Factor set
> to 2 this time and one of the 3 nodes deliberately DOWN. Again, we created a
> Column Family (say cf_1) and we tried writing something with different
> Consistency Levels. Here is what we got:
> ANY: worked (expected...)
> ONE: worked (expected...)
> TWO: did not work (WHAAAAT???)
> THREE: did not work (expected...)
> QUORUM: worked (expected...)
> ALL: did not work (expected I guess...)
>
> Now, we know that QUORUM derives from (RF/2)+1, so we were expecting that
> to work, after all only 1 node was DOWN. Why did Consistency Level TWO not
> work then???
>
> Third test... Same cluster again, different keyspace (say Test_3) with
> Replication Factor set to 3 this time and 1 of the 3 nodes deliberately DOWN
> again. Same approach again, created different Column Family (say cf_1) and
> different Consistency Level settings resulted in the following:
> ANY: worked (whaaaaat???)
> ONE: worked (whaaaaat???)
> TWO: did not work (whaaaaat???)
> THREE: did not work (expected...)
> QUORUM: worked (whaaaaat???)
> ALL: worked (whaaaaat???)
>
> We thought that if the Replication Factor is greater than the number of
> nodes in the cluster, writes are blocked.
>
> Apparently we are completely missing the a level of understanding here, so
> we would appreciate any help!
>
> Thank you in advance!
>
> Vasilis
>
>

Mime
View raw message