Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0B0581856A for ; Wed, 4 Nov 2015 21:28:45 +0000 (UTC) Received: (qmail 93418 invoked by uid 500); 4 Nov 2015 21:28:42 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 93378 invoked by uid 500); 4 Nov 2015 21:28:42 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 93368 invoked by uid 99); 4 Nov 2015 21:28:42 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Nov 2015 21:28:42 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 8D4AAC028A for ; Wed, 4 Nov 2015 21:28:41 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.999 X-Spam-Level: ** X-Spam-Status: No, score=2.999 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H2=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=blockcypher_com.20150623.gappssmtp.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id 6Y_OM0Ai5IFc for ; Wed, 4 Nov 2015 21:28:39 +0000 (UTC) Received: from mail-ob0-f175.google.com (mail-ob0-f175.google.com [209.85.214.175]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id DAFA6212A3 for ; Wed, 4 Nov 2015 21:28:38 +0000 (UTC) Received: by obbww6 with SMTP id ww6so24285595obb.0 for ; Wed, 04 Nov 2015 13:28:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=blockcypher_com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=5dsUTYNw+CSuZoGJb4rergn44MHiP7Mu7orUBQA42xM=; b=p8nTaR3D62F1vKBVcj+CIGGyzLBtBrEhK/z8NwwBb6f7cBniw5UlDVIBgEw2C0lo90 hgA2r8B3k1RCLbm4Ng75A7LXqMIgt/bCDOmScUPpz32o7/zEBjVSDY0/MtudoZD/qBy4 9tis65x8CYuXTeaTxVcJu1qP6BFxurXStaw60cvideAQk0nUZuR454ZfotfyD+XkHBk9 5vkC8laRgr6ixLng++yFHXq8sALcdmHMXodLQJPurEMXye9PTQZOI8t56B2MZUDv4E0L oEXhOKdq2JBlKRCKPxt2TEz/EH2DX0iFlfOxAif7aoihFbDZ+W0DJymHMHE4mvItLOda UTdQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=5dsUTYNw+CSuZoGJb4rergn44MHiP7Mu7orUBQA42xM=; b=Bb5lOeG26EeBPfm3/pOS9Nt40mjm273DwLf7vh3P/6jM4XVIPJbv0gEWdeHA4QWCZd hHhAN8GPKleaYtp0/apwZAoob0WSvwoU7282zkJjMSwv4/eTgKuE63BV/h/Zuhxt+Z5C k1HtO4qret3W1umkMmAWGN7avdEi+uHFBqQqoB1PS29V4MY/Se55WVmhyJuUjb/0MBKW 54m9FybTU4Wi38swXW4863+w2FrDNNVZpMaSZo+nHbv50SRy4p5dkEeoLPO5TmUDcBtT sJuRvRyeoq+wl24XN5Vf8HVGpnAHlU2T/oSKDpdHKc1O+1w6bwvLR3MHpGaxGnmeJXUT Bnrw== X-Gm-Message-State: ALoCoQkV5T3aqQ6Nv+V4bqsaIkD381b3WfUv9M4DnCnxbW4l8tWGC6qEVTpbom+VFAXW3cgM4jEd MIME-Version: 1.0 X-Received: by 10.60.51.5 with SMTP id g5mr2464522oeo.35.1446672518196; Wed, 04 Nov 2015 13:28:38 -0800 (PST) Received: by 10.182.221.230 with HTTP; Wed, 4 Nov 2015 13:28:38 -0800 (PST) In-Reply-To: References: Date: Wed, 4 Nov 2015 13:28:38 -0800 Message-ID: Subject: Re: Two node cassandra cluster doubts From: Bryan Cheng To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=001a11c30caa8df06d0523bdb12c --001a11c30caa8df06d0523bdb12c Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable I believe what's going on here is this step: Select Count (*) From MYTABLE;---> 15 rows Shut down Node B. Start Up Node B. Select Count (*) From MYTABLE;---> 15 rows To understand why this is an issue, consider the way that consistency is attempted within Cassandra. With RF=3D2, (You should really use an odd numb= er RF and LOCAL_QUORUM so you can tolerate a node failure, but that's another thing), your write is hitting Node B, and being queued for writing to Node A via a process called hinted handoff. Normally, this handoff occurs when Node A returns to the cluster, up to max_hint_window_in_ms later, causing all writes it missed to be replayed and integrated. However, since Node B also goes down during this time period, it loses the queued hints and therefore Node A never gets that write. You may see this flip flopping due to your query hitting Node A and Node B alternately (you can use trace to verify this). Keep in mind that due to Cassandra's architecture, missing writes will result in inconsistent data. There are mechanisms to help mitigate this, for example the aforementioned hinted handoff, or read repair. However, at the end of the day the only way to ensure consistent data is a repair. These mechanisms cannot operate reliably if the entire cluster goes down- which happens in your scenario between the above steps. On Mon, Nov 2, 2015 at 12:46 PM, Luis Miguel wrote: > Thanks for your answer! > I thought that bootstrapping is executed only when you add a node to the > cluster the first time after that I thought tgat gossip is the method use= d > to discover the cluster members again....In my case I thought that it was > more about a read repair issue.., am I wrong? > > ------------------------------ > Date: Mon, 2 Nov 2015 21:12:20 +0100 > Subject: Re: FW: Two node cassandra cluster doubts > From: ichi.sara@gmail.com > To: user@cassandra.apache.org > > > I think that this is a normal behaviour as you shut down your seed and > then reboot it. You should know that when you start a seed node it doesn'= t > do the bootstrapping thing. Which means it doesn't look if there are > changes in the contents of the tables. In here in your tests, you shut do= wn > node A before doing the inserts and started it after. So you node A doesn= 't > have the new rows you inserted. And yes it is normal to have different > values of your query each time. Because the coordinator node changes and > therfore the query is executed each time on a different node ( when nod= e > B answers you've got 15 rows and WHE node A does you have 10 rows) > Le 2 nov. 2015 19:22, "Luis Miguel" a =C3=A9crit : > > Hello! > > I have set a cassandra cluster with two nodes, Node A and Node B --> RF= =3D2, > Read CL=3D1 and Write CL =3D 1; > > Node A is seed... > > > At first everything is working well, when I add/delete/update entries on > Node A, everything is replicated on Node B and vice-versa, even if I shut > down node A, and I made new insertions on Node B meanwhile, and After tha= t > I start up node A again Cassandra recovers OK....BUT there is ONE case wh= en > this situation fails.... I am going to describe the process: > > Node A and Node B are sync. > > Select Count (*) From MYTABLE;---> 10 rows > > Shut down Node A. > > Made some inserts on Node B. > > Select Count (*) From MYTABLE;---> 15 rows > > Shut down Node B. > > Start Up Node B. > > Select Count (*) From MYTABLE;---> 15 rows > > (Everything Ok, yet). > > Start Up Node A. > > Select Count (*) From MYTABLE;---> 10 rows (uhmmm...this is weird...check > it again) > Select Count (*) From MYTABLE;---> 15 rows (wow!..this is correct, lets > try again) > Select Count (*) From MYTABLE;---> 10 rows (Ok...values are dancing) > > If I made the same queries on NODE B it Behaves the same way.... and it > only is solved with a nodetool repair...but I would prefer an automatic > fail-over... > > is there any way to avoid this??? or a nodetool repair execution is > mandatory??? > > Thanks in advance!!! > > --001a11c30caa8df06d0523bdb12c Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
I believe what's going on here is this step:=C2=A0

Select Count= (*) From MYTABLE;---> 15 rows

=
Shut down Node B.

Start Up Node B= .

Select Count (*) From MYTABLE;---> 15 rows


<= div style=3D"font-size:12.8px">To understand why this is an issue, consider= the way that consistency is attempted within Cassandra. With RF=3D2, (You = should really use an odd number RF and LOCAL_QUORUM so you can tolerate a n= ode failure, but that's another thing), your write is hitting Node B, a= nd being queued for writing to Node A via a process called hinted handoff. = Normally, this handoff occurs when Node A returns to the cluster, up to max= _hint_window_in_ms later, causing all writes it missed to be replayed and i= ntegrated. However, since Node B also goes down during this time period, it= loses the queued hints and therefore Node A never gets that write.
You= may see this flip flopping due to your query hitting Node A and Node B alt= ernately (you can use trace to verify this).

Keep in mind that due to C= assandra's architecture, missing writes will result in inconsistent dat= a. There are mechanisms to help mitigate this, for example the aforemention= ed hinted handoff, or read repair. However, at the end of the day the only = way to ensure consistent data is a repair. These mechanisms cannot operate = reliably if the entire cluster goes down- which happens in your scenario be= tween the above steps.



<= div class=3D"gmail_quote">On Mon, Nov 2, 2015 at 12:46 PM, Luis Miguel <a= rbox_@hotmail.com> wrote:
Thanks for your answer!
I thought that bootstrapp= ing is executed only when you add a node to the cluster the first time afte= r that I thought tgat gossip is the method used to discover the cluster mem= bers again....In my case I thought that it was more about a read repair iss= ue.., am I wrong?


Date: Mon, 2 Nov 2015 21:12:20 +0100
Subje= ct: Re: FW: Two node cassandra cluster doubts
From: ichi.sara@gmail.com
To: user@cassandra.apac= he.org


I think that this i= s a normal behaviour as you shut down your seed and then reboot it. You sho= uld know that when you start a seed node it doesn't do the bootstrappin= g thing. Which means it doesn't look if there are changes in the conten= ts of the tables. In here in your tests, you shut down node A before doing = the inserts and started it after. So you node A doesn't have the new ro= ws you inserted. And yes it is normal to have=C2=A0 different values of you= r query each time. Because the coordinator node changes and therfore=C2=A0 = the query is executed each time on a different node ( when=C2=A0 node B ans= wers you've got 15 rows and WHE=C2=A0 node A does you have 10 rows)

Le=C2=A02 nov. 2015 19:22, "Luis Miguel" <arbox_@hotmail.com> a =C3= =A9crit=C2=A0:
Hello!

I hav= e set a cassandra cluster with two nodes,=C2=A0Node A =C2=A0and Node B -->=C2=A0RF=3D2, Rea= d CL=3D1 and Write CL =3D 1;

Node A is seed= ...


At first everything is working = well, when I add/delete/update entries on Node A, everything is replicated = on Node B and vice-versa, even if I shut down node A, and I made new insert= ions on Node B meanwhile, and After that I start up node A again Cassandra = recovers OK....BUT there is ONE case when this situation fails.... I am goi= ng to describe the process:

Node A and Node B are = sync.

Select Count (*) From MYTABLE;---> 10 row= s

Shut down Node A.

Made = some inserts on Node B.

Select Count (*) From MYTA= BLE;---> 15 rows

Shut down Node B.
Start Up Node B.

Select Count (*) Fro= m MYTABLE;---> 15 rows

(Everything Ok, yet).

Start Up Node A.

Select Cou= nt (*) From MYTABLE;---> 10 rows (uhmmm...this is weird...check it again= )
Select Count (*) From MYTABLE;---> 15 rows =C2=A0(wow!..this= is correct, lets try again)
Select Count (*) From MYTABLE;--->= ; 10 rows (Ok...values are dancing)

If I made the = same queries on NODE B it Behaves the same way.... and it only is solved wi= th a nodetool repair...but I would prefer an automatic fail-over...

is there any way to avoid this??? or a nodetool repair ex= ecution is mandatory???

Thanks in advance!!!
=

--001a11c30caa8df06d0523bdb12c--