Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
MIME-Version: 1.0
In-Reply-To: <DUB127-W4398B3600065D05AD688D3F62C0@phx.gbl>
References: <DUB127-W396DF38AD213943BCCA96BF62C0@phx.gbl>
	<DUB127-W1042764D2976CB65075D4BF62C0@phx.gbl>
	<CADeXowg9Ds7sG2=D76vCUHPoviwhGJv5V=MsoypWd2ttVpOq-Q@mail.gmail.com>
	<DUB127-W4398B3600065D05AD688D3F62C0@phx.gbl>
Date: Wed, 4 Nov 2015 13:28:38 -0800
Message-ID: 
 <CANeMN=8_8AR3RfoyBfmRMUsqeqBnrzMvg_vFKcxhx=10dc_PEA@mail.gmail.com>
Subject: Re: Two node cassandra cluster doubts
From: Bryan Cheng <bryan@blockcypher.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=001a11c30caa8df06d0523bdb12c

--001a11c30caa8df06d0523bdb12c
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

I believe what's going on here is this step:


Select Count (*) From MYTABLE;---> 15 rows

Shut down Node B.

Start Up Node B.

Select Count (*) From MYTABLE;---> 15 rows


To understand why this is an issue, consider the way that consistency is
attempted within Cassandra. With RF=3D2, (You should really use an odd numb=
er
RF and LOCAL_QUORUM so you can tolerate a node failure, but that's another
thing), your write is hitting Node B, and being queued for writing to Node
A via a process called hinted handoff. Normally, this handoff occurs when
Node A returns to the cluster, up to max_hint_window_in_ms later, causing
all writes it missed to be replayed and integrated. However, since Node B
also goes down during this time period, it loses the queued hints and
therefore Node A never gets that write.

You may see this flip flopping due to your query hitting Node A and Node B
alternately (you can use trace to verify this).

Keep in mind that due to Cassandra's architecture, missing writes will
result in inconsistent data. There are mechanisms to help mitigate this,
for example the aforementioned hinted handoff, or read repair. However, at
the end of the day the only way to ensure consistent data is a repair.
These mechanisms cannot operate reliably if the entire cluster goes down-
which happens in your scenario between the above steps.


On Mon, Nov 2, 2015 at 12:46 PM, Luis Miguel <arbox_@hotmail.com> wrote:

> Thanks for your answer!
> I thought that bootstrapping is executed only when you add a node to the
> cluster the first time after that I thought tgat gossip is the method use=
d
> to discover the cluster members again....In my case I thought that it was
> more about a read repair issue.., am I wrong?
>
> ------------------------------
> Date: Mon, 2 Nov 2015 21:12:20 +0100
> Subject: Re: FW: Two node cassandra cluster doubts
> From: ichi.sara@gmail.com
> To: user@cassandra.apache.org
>
>
> I think that this is a normal behaviour as you shut down your seed and
> then reboot it. You should know that when you start a seed node it doesn'=
t
> do the bootstrapping thing. Which means it doesn't look if there are
> changes in the contents of the tables. In here in your tests, you shut do=
wn
> node A before doing the inserts and started it after. So you node A doesn=
't
> have the new rows you inserted. And yes it is normal to have  different
> values of your query each time. Because the coordinator node changes and
> therfore  the query is executed each time on a different node ( when  nod=
e
> B answers you've got 15 rows and WHE  node A does you have 10 rows)
> Le 2 nov. 2015 19:22, "Luis Miguel" <arbox_@hotmail.com> a =C3=A9crit :
>
> Hello!
>
> I have set a cassandra cluster with two nodes, Node A  and Node B --> RF=
=3D2,
> Read CL=3D1 and Write CL =3D 1;
>
> Node A is seed...
>
>
> At first everything is working well, when I add/delete/update entries on
> Node A, everything is replicated on Node B and vice-versa, even if I shut
> down node A, and I made new insertions on Node B meanwhile, and After tha=
t
> I start up node A again Cassandra recovers OK....BUT there is ONE case wh=
en
> this situation fails.... I am going to describe the process:
>
> Node A and Node B are sync.
>
> Select Count (*) From MYTABLE;---> 10 rows
>
> Shut down Node A.
>
> Made some inserts on Node B.
>
> Select Count (*) From MYTABLE;---> 15 rows
>
> Shut down Node B.
>
> Start Up Node B.
>
> Select Count (*) From MYTABLE;---> 15 rows
>
> (Everything Ok, yet).
>
> Start Up Node A.
>
> Select Count (*) From MYTABLE;---> 10 rows (uhmmm...this is weird...check
> it again)
> Select Count (*) From MYTABLE;---> 15 rows  (wow!..this is correct, lets
> try again)
> Select Count (*) From MYTABLE;---> 10 rows (Ok...values are dancing)
>
> If I made the same queries on NODE B it Behaves the same way.... and it
> only is solved with a nodetool repair...but I would prefer an automatic
> fail-over...
>
> is there any way to avoid this??? or a nodetool repair execution is
> mandatory???
>
> Thanks in advance!!!
>
>

--001a11c30caa8df06d0523bdb12c
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">I believe what&#39;s going on here is this step:=C2=A0<div=
><br></div><div><div style=3D"font-size:12.8px"><br class=3D"">Select Count=
 (*) From MYTABLE;---&gt; 15 rows</div><div style=3D"font-size:12.8px"><br>=
</div><div style=3D"font-size:12.8px">Shut down Node B.</div><div style=3D"=
font-size:12.8px"><br></div><div style=3D"font-size:12.8px">Start Up Node B=
.</div><div style=3D"font-size:12.8px"><br></div><div style=3D"font-size:12=
.8px">Select Count (*) From MYTABLE;---&gt; 15 rows</div></div><div style=
=3D"font-size:12.8px"><br></div><div style=3D"font-size:12.8px"><br></div><=
div style=3D"font-size:12.8px">To understand why this is an issue, consider=
 the way that consistency is attempted within Cassandra. With RF=3D2, (You =
should really use an odd number RF and LOCAL_QUORUM so you can tolerate a n=
ode failure, but that&#39;s another thing), your write is hitting Node B, a=
nd being queued for writing to Node A via a process called hinted handoff. =
Normally, this handoff occurs when Node A returns to the cluster, up to max=
_hint_window_in_ms later, causing all writes it missed to be replayed and i=
ntegrated. However, since Node B also goes down during this time period, it=
 loses the queued hints and therefore Node A never gets that write.</div><d=
iv style=3D"font-size:12.8px"><br></div><div style=3D"font-size:12.8px">You=
 may see this flip flopping due to your query hitting Node A and Node B alt=
ernately (you can use trace to verify this).</div><div style=3D"font-size:1=
2.8px"><br></div><div style=3D"font-size:12.8px">Keep in mind that due to C=
assandra&#39;s architecture, missing writes will result in inconsistent dat=
a. There are mechanisms to help mitigate this, for example the aforemention=
ed hinted handoff, or read repair. However, at the end of the day the only =
way to ensure consistent data is a repair. These mechanisms cannot operate =
reliably if the entire cluster goes down- which happens in your scenario be=
tween the above steps.</div><div style=3D"font-size:12.8px"><br></div><div =
style=3D"font-size:12.8px"><br></div></div><div class=3D"gmail_extra"><br><=
div class=3D"gmail_quote">On Mon, Nov 2, 2015 at 12:46 PM, Luis Miguel <spa=
n dir=3D"ltr">&lt;<a href=3D"mailto:arbox_@hotmail.com" target=3D"_blank">a=
rbox_@hotmail.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote=
" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div><div dir=3D"ltr">Thanks for your answer! <br>I thought that bootstrapp=
ing is executed only when you add a node to the cluster the first time afte=
r that I thought tgat gossip is the method used to discover the cluster mem=
bers again....In my case I thought that it was more about a read repair iss=
ue.., am I wrong? <br><br><hr>Date: Mon, 2 Nov 2015 21:12:20 +0100<br>Subje=
ct: Re: FW: Two node cassandra cluster doubts<br>From: <a href=3D"mailto:ic=
hi.sara@gmail.com" target=3D"_blank">ichi.sara@gmail.com</a><br>To: <a href=
=3D"mailto:user@cassandra.apache.org" target=3D"_blank">user@cassandra.apac=
he.org</a><div><div class=3D"h5"><br><br><p dir=3D"ltr">I think that this i=
s a normal behaviour as you shut down your seed and then reboot it. You sho=
uld know that when you start a seed node it doesn&#39;t do the bootstrappin=
g thing. Which means it doesn&#39;t look if there are changes in the conten=
ts of the tables. In here in your tests, you shut down node A before doing =
the inserts and started it after. So you node A doesn&#39;t have the new ro=
ws you inserted. And yes it is normal to have=C2=A0 different values of you=
r query each time. Because the coordinator node changes and therfore=C2=A0 =
the query is executed each time on a different node ( when=C2=A0 node B ans=
wers you&#39;ve got 15 rows and WHE=C2=A0 node A does you have 10 rows)</p>
<div>Le=C2=A02 nov. 2015 19:22, &quot;Luis Miguel&quot; &lt;<a href=3D"mail=
to:arbox_@hotmail.com" target=3D"_blank">arbox_@hotmail.com</a>&gt; a =C3=
=A9crit=C2=A0:<br><blockquote style=3D"border-left:1px #ccc solid;padding-l=
eft:1ex">


<div><div dir=3D"ltr"><div><div dir=3D"ltr">Hello!<div><br></div><div>I hav=
e set a cassandra cluster with two nodes,=C2=A0<span style=3D"font-size:12p=
t">Node A =C2=A0and Node B --&gt;=C2=A0</span><span style=3D"color:rgb(34,3=
4,34);font-family:&#39;Helvetica Neue&#39;,Helvetica,Arial,sans-serif;font-=
size:13px;line-height:16.9px;background-color:rgb(255,255,255)">RF=3D2, Rea=
d CL=3D1 and Write CL =3D 1;</span></div><div><br></div><div>Node A is seed=
...</div><div><br></div><div><br></div><div>At first everything is working =
well, when I add/delete/update entries on Node A, everything is replicated =
on Node B and vice-versa, even if I shut down node A, and I made new insert=
ions on Node B meanwhile, and After that I start up node A again Cassandra =
recovers OK....BUT there is ONE case when this situation fails.... I am goi=
ng to describe the process:</div><div><br></div><div>Node A and Node B are =
sync.</div><div><br></div><div>Select Count (*) From MYTABLE;---&gt; 10 row=
s</div><div><br></div><div>Shut down Node A.</div><div><br></div><div>Made =
some inserts on Node B.</div><div><br></div><div>Select Count (*) From MYTA=
BLE;---&gt; 15 rows</div><div><br></div><div>Shut down Node B.</div><div><b=
r></div><div>Start Up Node B.</div><div><br></div><div>Select Count (*) Fro=
m MYTABLE;---&gt; 15 rows</div><div><br></div><div>(Everything Ok, yet).</d=
iv><div><br></div><div>Start Up Node A.</div><div><br></div><div>Select Cou=
nt (*) From MYTABLE;---&gt; 10 rows (uhmmm...this is weird...check it again=
)</div><div>Select Count (*) From MYTABLE;---&gt; 15 rows =C2=A0(wow!..this=
 is correct, lets try again)</div><div>Select Count (*) From MYTABLE;---&gt=
; 10 rows (Ok...values are dancing)</div><div><br></div><div>If I made the =
same queries on NODE B it Behaves the same way.... and it only is solved wi=
th a nodetool repair...but I would prefer an automatic fail-over...</div><d=
iv><br></div><div>is there any way to avoid this??? or a nodetool repair ex=
ecution is mandatory???</div><div><br></div><div>Thanks in advance!!!</div>=
 		 	   		  </div></div> 		 	   		  </div></div>
</blockquote></div>
 		 	   		  </div></div></div></div>
</blockquote></div><br></div>

--001a11c30caa8df06d0523bdb12c--