From user-return-62964-archive-asf-public=cust-asf.ponee.io@cassandra.apache.org  Tue Jan  8 18:47:43 2019
Return-Path: <user-return-62964-archive-asf-public=cust-asf.ponee.io@cassandra.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
	by mx-eu-01.ponee.io (Postfix) with SMTP id BFF20180652
	for <archive-asf-public@cust-asf.ponee.io>; Tue,  8 Jan 2019 18:47:42 +0100 (CET)
Received: (qmail 22471 invoked by uid 500); 8 Jan 2019 17:47:40 -0000
Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:user-help@cassandra.apache.org>
List-Unsubscribe: <mailto:user-unsubscribe@cassandra.apache.org>
List-Post: <mailto:user@cassandra.apache.org>
List-Id: <user.cassandra.apache.org>
Reply-To: user@cassandra.apache.org
Delivered-To: mailing list user@cassandra.apache.org
Received: (qmail 22461 invoked by uid 99); 8 Jan 2019 17:47:40 -0000
Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142)
    by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Jan 2019 17:47:40 +0000
Received: from localhost (localhost [127.0.0.1])
	by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 7FA4B180D3C
	for <user@cassandra.apache.org>; Tue,  8 Jan 2019 17:47:40 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org
X-Spam-Flag: NO
X-Spam-Score: 1.857
X-Spam-Level: *
X-Spam-Status: No, score=1.857 tagged_above=-999 required=6.31
	tests=[DKIMWL_WL_MED=-0.143, DKIM_SIGNED=0.1, DKIM_VALID=-0.1,
	HEADER_FROM_DIFFERENT_DOMAINS=0.001, HTML_MESSAGE=2,
	RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=disabled
Authentication-Results: spamd3-us-west.apache.org (amavisd-new);
	dkim=pass (2048-bit key) header.d=jonhaddad-com.20150623.gappssmtp.com
Received: from mx1-lw-eu.apache.org ([10.40.0.8])
	by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024)
	with ESMTP id 7KtL0-rRpNtC for <user@cassandra.apache.org>;
	Tue,  8 Jan 2019 17:47:38 +0000 (UTC)
Received: from mail-io1-f51.google.com (mail-io1-f51.google.com [209.85.166.51])
	by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 6A4C960E75
	for <user@cassandra.apache.org>; Tue,  8 Jan 2019 17:47:37 +0000 (UTC)
Received: by mail-io1-f51.google.com with SMTP id f4so3829086ion.2
        for <user@cassandra.apache.org>; Tue, 08 Jan 2019 09:47:37 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=jonhaddad-com.20150623.gappssmtp.com; s=20150623;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to;
        bh=98FOzQeX2WBKH92GnsCRksSEgR76KHc2Uddsf/mTCXc=;
        b=IQQtFv0ZHPDz4nWgi1K3BqFKUnmybxz1usAZxhsrQWY7MyEMlbpyZt0NApzfvwLVZZ
         QsACRAgQqSJzfHEEO5zBlZYypDgqG0NXJ6lacCQ1q6vCe0S/afzJAPwp25zWN99aGMy9
         ePZoADeUR/Tzi2D2l3S4HSzNs8ASsBBV1/3KTMVqw8IEck/ItpS+1PRVL1Bvhif0xQ1a
         KEBHD5fuMt8/vtfaFAWzmKTYBwsUBKGBn2NIbgbpIl/S6Ts92GAikcfi0mQbSEOPB9Jv
         HOXbt9M8DvzZUWBuwYckSWypYnybwP1h8ZgkOhquqBFvgVs3POcK+4X/Men8M4H0e0UR
         R4SQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to;
        bh=98FOzQeX2WBKH92GnsCRksSEgR76KHc2Uddsf/mTCXc=;
        b=hVLOPoxOxi8a1qG+QOyX0uLc9cnzkxx1uAiw1EibUrWL1WpEEf0RphL+KkdJEvkz1K
         tap+WuS/rmgsM84As6MuXb6834prthvybJVp+lw7Y+zPCJqUv6qZR/iKpCRXPQYYOWxm
         ywk7sflpAIEhmIvyBEev7LBV4KL6joq4A2ZKp1VSOFR8acltdjf1yrO0d1Z8lDHhxwqH
         GYl8INyVfpnq76853oAyMGKMbEgzHAipEulSL2Ft9M/UggFhya7MiZ+zAe5y1qrdgfva
         d9Czb09Ee9WBrSaS/nrKYolWYDc3Wt40X2btH5AOe0LHHaKv+fkBGg1N5hZ0Ycwg3Um4
         xZ4w==
X-Gm-Message-State: AJcUukcujdTxlMjrBJ7xNsByLrlZa2nffwbzlOFuAud+gk/45OnVAX86
	OnCRm4XdejMje0zmDo0YI6XwC6NcoCWVYKLtl6ksNA==
X-Google-Smtp-Source: ALg8bN5TV9HEnH8tqklDS51CAKoPOm9b6aDvgwy6F4ArzzTGD11kuQDGKNd9eXZliW2XWN6T9lCWn/jc122g3JTWw0M=
X-Received: by 2002:a6b:ed0b:: with SMTP id n11mr1860643iog.90.1546969649835;
 Tue, 08 Jan 2019 09:47:29 -0800 (PST)
MIME-Version: 1.0
References: <CAFgXP-zzsZivYtR0P3EcBNVU7Hak1mChBxnhfkuTOjBZkz8EiQ@mail.gmail.com>
 <B55F9FEA-B840-4FFC-A5B1-AF49AEFD7A9F@gmail.com> <CAFgXP-w23xDTc=y9K5E3UyDx+f+nZK1fQgWQtaPJ2T4NJWCBEg@mail.gmail.com>
 <CA+EmchkdSkLjAK6EHF+cQHX7ab9ASsCFJaaRh7t-dfhuNyORCQ@mail.gmail.com>
In-Reply-To: <CA+EmchkdSkLjAK6EHF+cQHX7ab9ASsCFJaaRh7t-dfhuNyORCQ@mail.gmail.com>
From: Jonathan Haddad <jon@jonhaddad.com>
Date: Tue, 8 Jan 2019 09:47:18 -0800
Message-ID: <CACUnPaBEBLDrMAgamcjyQpq8xd-Vr4UXn+texM85A=_3egzFgA@mail.gmail.com>
Subject: Re: How seed nodes are working and how to upgrade/replace them?
To: user <user@cassandra.apache.org>
Content-Type: multipart/alternative; boundary="00000000000074f5ff057ef5f0d2"

--00000000000074f5ff057ef5f0d2
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

I've done some gossip simulations in the past and found virtually no
difference in the time it takes for messages to propagate in almost any
sized cluster.  IIRC it always converges by 17 iterations.  Thus, I
completely agree with Jeff's comment here.  If you aren't pushing 800-1000
nodes, it's not even worth bothering with.  Just be sure you have seeds in
each DC.

Something to be aware of - there's only a chance to gossip with a seed.
That chance goes down as cluster size increases, meaning seeds have less
and less of an impact as the cluster grows.  Once you get to 100+ nodes, a
given node is very rarely talking to a seed.

Just make sure when you start a node it's not in its own seed list and
you're good.


On Tue, Jan 8, 2019 at 9:39 AM Jeff Jirsa <jjirsa@gmail.com> wrote:

>
>
> On Tue, Jan 8, 2019 at 8:19 AM Jonathan Ballet <jballet@edgelab.ch> wrote=
:
>
>> Hi Jeff,
>>
>> thanks for answering to most of my points!
>> From the reloadseeds' ticket, I followed to
>> https://issues.apache.org/jira/browse/CASSANDRA-3829 which was very
>> instructive, although a bit old.
>>
>>
>> On Mon, 7 Jan 2019 at 17:23, Jeff Jirsa <jjirsa@gmail.com> wrote:
>>
>>> > On Jan 7, 2019, at 6:37 AM, Jonathan Ballet <jballet@edgelab.ch>
>>> wrote:
>>> >
>>> [...]
>>>
>>> >   In essence, in my example that would be:
>>> >
>>> >   - decide that #2 and #3 will be the new seed nodes
>>> >   - update all the configuration files of all the nodes to write the
>>> IP addresses of #2 and #3
>>> >   - DON'T restart any node - the new seed configuration will be picke=
d
>>> up only if the Cassandra process restarts
>>> >
>>> > * If I can manage to sort my Cassandra nodes by their age, could it b=
e
>>> a strategy to have the seeds set to the 2 oldest nodes in the cluster?
>>> (This implies these nodes would change as the cluster's nodes get
>>> upgraded/replaced).
>>>
>>> You could do this, seems like a lot of headache for little benefit.
>>> Could be done with simple seed provider and config management
>>> (puppet/chef/ansible) laying  down new yaml or with your own seed provi=
der
>>>
>>
>> So, just to make it clear: sorting by age isn't a goal in itself, it was
>> just an example on how I could get a stable list.
>>
>> Right now, we have a dedicated group of seed nodes + a dedicated group
>> for non-seeds: doing rolling-upgrade of the nodes from the second list i=
s
>> relatively painless (although slow) whereas we are facing the issues
>> discussed in CASSANDRA-3829 for the first group which are non-seeds node=
s
>> are not bootstrapping automatically and we need to operate them in a mor=
e
>> careful way.
>>
>>
> Rolling upgrade shouldn't need to re-bootstrap. Only replacing a host
> should need a new bootstrap. That should be a new host in your list, so i=
t
> seems like this should be fairly rare?
>
>
>> What I'm really looking for is a way to simplify adding and removing
>> nodes into our (small) cluster: I can easily provide a small list of nod=
es
>> from our cluster with our config management tool so that new nodes are
>> discovering the rest of the cluster, but the documentation seems to impl=
y
>> that seed nodes also have other functions and I'm not sure what problems=
 we
>> could face trying to simplify this approach.
>>
>> Ideally, what I would like to have would be:
>>
>> * Considering a stable cluster (no new nodes, no nodes leaving), the N
>> seeds should be always the same N nodes
>> * Adding new nodes should not change that list
>> * Stopping/removing one of these N nodes should "promote" another
>> (non-seed) node as a seed
>>   - that would not restart the already running Cassandra nodes but would
>> update their configuration files.
>>   - if a node restart for whatever reason it would pick up this new
>> configuration
>>
>> So: no node would start its life as a seed, only a few already existing
>> node would have this status. We would not have to deal with the "a seed
>> node doesn't bootstrap" problem and it would make our operation process
>> simpler.
>>
>>
>>> > I also have some more general questions about seed nodes and how they
>>> work:
>>> >
>>> > * I understand that seed nodes are used when a node starts and needs
>>> to discover the rest of the cluster's nodes. Once the node has joined a=
nd
>>> the cluster is stable, are seed nodes still playing a role in day to da=
y
>>> operations?
>>>
>>> They=E2=80=99re used probabilistically in gossip to encourage convergen=
ce.
>>> Mostly useful in large clusters.
>>>
>>
>> How "large" are we speaking here? How many nodes would it start to be
>> considered "large"?
>>
>
> ~800-1000
>
>
>> Also, about the convergence: is this related to how fast/often the
>> cluster topology is changing? (new nodes, leaving nodes, underlying IP
>> addresses changing, etc.)
>>
>>
> New nodes, nodes going up/down, and schema propagation.
>
>
>> Thanks for your answers!
>>
>>  Jonathan
>>
>

--=20
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

--00000000000074f5ff057ef5f0d2
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">I&#39;ve done some gossip simulations in the past and foun=
d virtually no difference in the time it takes for messages to propagate in=
 almost any sized cluster.=C2=A0 IIRC it always converges by 17 iterations.=
=C2=A0 Thus, I completely agree with Jeff&#39;s comment here.=C2=A0 If you =
aren&#39;t pushing 800-1000 nodes, it&#39;s not even worth bothering with.=
=C2=A0 Just be sure you have seeds in each DC.=C2=A0=C2=A0<div><br></div><d=
iv>Something to be aware of - there&#39;s only a chance to gossip with a se=
ed.=C2=A0 That chance goes down as cluster size increases, meaning seeds ha=
ve less and less of an impact as the cluster grows.=C2=A0 Once you get to 1=
00+ nodes, a given node is very rarely talking to a seed.</div><div><br></d=
iv><div>Just make sure when you start a node it&#39;s not in its own seed l=
ist and you&#39;re good.</div><div><br></div></div><br><div class=3D"gmail_=
quote"><div dir=3D"ltr">On Tue, Jan 8, 2019 at 9:39 AM Jeff Jirsa &lt;<a hr=
ef=3D"mailto:jjirsa@gmail.com">jjirsa@gmail.com</a>&gt; wrote:<br></div><bl=
ockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-lef=
t:1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr"><div dir=3D=
"ltr"><br></div><br><div class=3D"gmail_quote"><div dir=3D"ltr">On Tue, Jan=
 8, 2019 at 8:19 AM Jonathan Ballet &lt;<a href=3D"mailto:jballet@edgelab.c=
h" target=3D"_blank">jballet@edgelab.ch</a>&gt; wrote:<br></div><blockquote=
 class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px so=
lid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr"><div dir=3D"ltr"><d=
iv dir=3D"ltr"><div>Hi Jeff,</div><div><br></div><div>thanks for answering =
to most of my points!</div><div>From the reloadseeds&#39; ticket, I followe=
d to <a href=3D"https://issues.apache.org/jira/browse/CASSANDRA-3829" targe=
t=3D"_blank">https://issues.apache.org/jira/browse/CASSANDRA-3829</a> which=
 was very instructive, although a bit old.</div><div><br></div></div><br><d=
iv class=3D"gmail_quote"><div dir=3D"ltr">On Mon, 7 Jan 2019 at 17:23, Jeff=
 Jirsa &lt;<a href=3D"mailto:jjirsa@gmail.com" target=3D"_blank">jjirsa@gma=
il.com</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"m=
argin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left=
:1ex">
&gt; On Jan 7, 2019, at 6:37 AM, Jonathan Ballet &lt;<a href=3D"mailto:jbal=
let@edgelab.ch" target=3D"_blank">jballet@edgelab.ch</a>&gt; wrote:<br>
&gt; <br>[...]<br><br>
&gt;=C2=A0 =C2=A0In essence, in my example that would be:<br>
&gt; <br>
&gt;=C2=A0 =C2=A0- decide that #2 and #3 will be the new seed nodes<br>
&gt;=C2=A0 =C2=A0- update all the configuration files of all the nodes to w=
rite the IP addresses of #2 and #3<br>
&gt;=C2=A0 =C2=A0- DON&#39;T restart any node - the new seed configuration =
will be picked up only if the Cassandra process restarts<br>
&gt; <br>
&gt; * If I can manage to sort my Cassandra nodes by their age, could it be=
 a strategy to have the seeds set to the 2 oldest nodes in the cluster? (Th=
is implies these nodes would change as the cluster&#39;s nodes get upgraded=
/replaced).<br>
<br>
You could do this, seems like a lot of headache for little benefit. Could b=
e done with simple seed provider and config management (puppet/chef/ansible=
) laying=C2=A0 down new yaml or with your own seed provider<br></blockquote=
><div><br></div><div>So, just to make it clear: sorting by age isn&#39;t a =
goal in itself, it was just an example on how I could get a stable list.</d=
iv><div><br></div><div>Right now, we have a dedicated group of seed nodes +=
 a dedicated group for non-seeds: doing rolling-upgrade of the nodes from t=
he second list is relatively painless (although slow) whereas we are facing=
 the issues discussed in CASSANDRA-3829 for the first group which are non-s=
eeds nodes are not bootstrapping automatically and we need to operate them =
in a more careful way.</div><div><br></div></div></div></div></blockquote><=
div><br></div><div>Rolling upgrade shouldn&#39;t need to re-bootstrap. Only=
 replacing a host should need a new bootstrap. That should be a new host in=
 your list, so it seems like this should be fairly rare?=C2=A0</div><div>=
=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0=
.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"l=
tr"><div dir=3D"ltr"><div class=3D"gmail_quote"><div></div><div>What I&#39;=
m really looking for is a way to simplify adding and removing nodes into ou=
r (small) cluster: I can easily provide a small list of nodes from our clus=
ter with our config management tool so that new nodes are discovering the r=
est of the cluster, but the documentation seems to imply that seed nodes al=
so have other functions and I&#39;m not sure what problems we could face tr=
ying to simplify this approach.</div><div><br></div><div>Ideally, what I wo=
uld like to have would be:</div><div><br></div><div>* Considering a stable =
cluster (no new nodes, no  nodes leaving), the N seeds should be always the=
 same N nodes</div><div>* Adding new nodes should not change that list</div=
><div>* Stopping/removing one of these N nodes should &quot;promote&quot; a=
nother (non-seed) node as a seed</div><div>=C2=A0 - that would not restart =
the already running Cassandra nodes but would update their configuration fi=
les.</div><div>=C2=A0 - if a node restart for whatever reason it would pick=
 up this new configuration<br></div><div><br>So: no node would start its li=
fe as a seed, only a few already existing node would have this status. We w=
ould not have to deal with the &quot;a seed node doesn&#39;t bootstrap&quot=
; problem and it would make our operation process simpler.</div><div>=C2=A0=
</div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;b=
order-left:1px solid rgb(204,204,204);padding-left:1ex">&gt; I also have so=
me more general questions about seed nodes and how they work:<br>
&gt; <br>
&gt; * I understand that seed nodes are used when a node starts and needs t=
o discover the rest of the cluster&#39;s nodes. Once the node has joined an=
d the cluster is stable, are seed nodes still playing a role in day to day =
operations?<br>
<br>
They=E2=80=99re used probabilistically in gossip to encourage convergence. =
Mostly useful in large clusters. <br></blockquote><div><br></div><div>How &=
quot;large&quot; are we speaking here? How many nodes would it start to be =
considered &quot;large&quot;?</div></div></div></div></blockquote><div><br>=
</div><div>~800-1000</div><div>=C2=A0</div><blockquote class=3D"gmail_quote=
" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);=
padding-left:1ex"><div dir=3D"ltr"><div dir=3D"ltr"><div class=3D"gmail_quo=
te"><div>Also, about the convergence: is this related to how fast/often the=
 cluster topology is changing? (new nodes, leaving nodes, underlying IP add=
resses changing, etc.)<br></div></div><div class=3D"gmail_quote"><br></div>=
</div></div></blockquote><div><br></div><div>New nodes, nodes going up/down=
, and schema propagation.=C2=A0</div><div>=C2=A0</div><blockquote class=3D"=
gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(20=
4,204,204);padding-left:1ex"><div dir=3D"ltr"><div dir=3D"ltr"><div class=
=3D"gmail_quote">Thanks for your answers!<br><br></div><div class=3D"gmail_=
quote">=C2=A0Jonathan<br></div></div></div>
</blockquote></div></div>
</blockquote></div><br clear=3D"all"><div><br></div>-- <br><div dir=3D"ltr"=
 class=3D"gmail_signature">Jon Haddad<br><a href=3D"http://www.rustyrazorbl=
ade.com" target=3D"_blank">http://www.rustyrazorblade.com</a><br>twitter: r=
ustyrazorblade</div>

--00000000000074f5ff057ef5f0d2--