From user-return-64022-archive-asf-public=cust-asf.ponee.io@cassandra.apache.org  Wed Jun 12 21:11:26 2019
Return-Path: <user-return-64022-archive-asf-public=cust-asf.ponee.io@cassandra.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [207.244.88.153])
	by mx-eu-01.ponee.io (Postfix) with SMTP id 0582318061A
	for <archive-asf-public@cust-asf.ponee.io>; Wed, 12 Jun 2019 23:11:25 +0200 (CEST)
Received: (qmail 17442 invoked by uid 500); 12 Jun 2019 21:11:18 -0000
Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:user-help@cassandra.apache.org>
List-Unsubscribe: <mailto:user-unsubscribe@cassandra.apache.org>
List-Post: <mailto:user@cassandra.apache.org>
List-Id: <user.cassandra.apache.org>
Reply-To: user@cassandra.apache.org
Delivered-To: mailing list user@cassandra.apache.org
Received: (qmail 17432 invoked by uid 99); 12 Jun 2019 21:11:17 -0000
Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142)
    by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Jun 2019 21:11:17 +0000
Received: from localhost (localhost [127.0.0.1])
	by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 64C751A44B2
	for <user@cassandra.apache.org>; Wed, 12 Jun 2019 21:11:17 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org
X-Spam-Flag: NO
X-Spam-Score: 1.799
X-Spam-Level: *
X-Spam-Status: No, score=1.799 tagged_above=-999 required=6.31
	tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1,
	DKIM_VALID_EF=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001,
	RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001]
	autolearn=disabled
Authentication-Results: spamd2-us-west.apache.org (amavisd-new);
	dkim=pass (2048-bit key) header.d=smartthings.com
Received: from mx1-lw-eu.apache.org ([10.40.0.8])
	by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024)
	with ESMTP id r4E6so1x50VU for <user@cassandra.apache.org>;
	Wed, 12 Jun 2019 21:11:15 +0000 (UTC)
Received: from mail-ed1-f50.google.com (mail-ed1-f50.google.com [209.85.208.50])
	by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id EEFC55F490
	for <user@cassandra.apache.org>; Wed, 12 Jun 2019 21:11:14 +0000 (UTC)
Received: by mail-ed1-f50.google.com with SMTP id a14so25910689edv.12
        for <user@cassandra.apache.org>; Wed, 12 Jun 2019 14:11:14 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to;
        bh=nMkyL0pVPqkvLN5W7v0oKxpsBIRkR2FN9H/1OUDUvrg=;
        b=RNZRZW8aVXlIY7aJJpyBwKjpXijAVwAINNX37cwgTtkZf/FgkbaU9ZXF+BXVp2P0d0
         R+xBkZ8/SsqF74JfVKf7N8JD0XzxPqCLAcObbNiEODIqK/EiXV8SFQ34h0Q6S/WDeRin
         YZ/xXlvoUKm7hFJqK7NvsrqxYknec1ScT9T4Z3Nx9ulAo81RvX7pn7JmiQen2gt7iAy7
         D4Nw3Ba+ZePu/Ko4c3SYS7s5sS1bEvWl8SCmJ9p66++TOeJNW6kqTsJoC8Yw9WbpcyCh
         p+Jy1uiIX/GDl2O5h3ucrRcOnex6ZWplULi3Q/P6zjpBd7KaGW4UazZ54whG2xHsA3B4
         KIkA==
X-Gm-Message-State: APjAAAXgjfXfgDRtIRDUB/AnuOJnSES3idWIkaPhTY4I2Z/0LrDP/WbR
	RtuvBSh/508DpQnHP9ouPOFa2ynSnCS97v02HqRR2uzHXQI=
X-Google-Smtp-Source: APXvYqz5cXGNC7KbDL/C5u7ehX1PPHyhL/ZLXHTZcZ1IIOcT0qCTLfp9fjt5WZIF25WmqRPM6ZBLthFr46xGnDj77qA=
X-Received: by 2002:a50:a3ec:: with SMTP id t41mr31372009edb.43.1560373874391;
 Wed, 12 Jun 2019 14:11:14 -0700 (PDT)
MIME-Version: 1.0
References: <053247A8CBB6754B8345743B8F18D68D525A16D6@MOSTLS1MSGUSRFA.ITServices.sbc.com>
 <CAFjxgDyubBs4QM_vqnWsfwxmCKsocJj4=cY+L325Gc-8yX-sdg@mail.gmail.com> <CAFjxgDxEVKe7==zMv-qATGnk6-uqW0UdheXA_yYttNYudroL0g@mail.gmail.com>
In-Reply-To: <CAFjxgDxEVKe7==zMv-qATGnk6-uqW0UdheXA_yYttNYudroL0g@mail.gmail.com>
From: Carl Mueller <carl.mueller@smartthings.com.INVALID>
Date: Wed, 12 Jun 2019 16:11:03 -0500
Message-ID: <CAFjxgDzzUkKhm58J1jcgRNUEPszDVxKWgNBFw404e4dakLwPbQ@mail.gmail.com>
Subject: Re: postmortem on 2.2.13 scale out difficulties
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary="000000000000801661058b26dacd"

--000000000000801661058b26dacd
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

I posted a bug, cassandra-15155 :
https://issues.apache.org/jira/browse/CASSANDRA-15155?jql=3Dproject%20%3D%2=
0CASSANDRA

It seems VERY similar to
https://issues.apache.org/jira/browse/CASSANDRA-6648

On Wed, Jun 12, 2019 at 12:14 PM Carl Mueller <carl.mueller@smartthings.com=
>
wrote:

> And once the cluster token map formation is done, it starts bootstrap and
> we get a ton of these:
>
> WARN  [MessagingService-Incoming-/2406:da14:95b:4503:910e:23fd:dafa:9983]
> 2019-06-12 15:22:04,760 IncomingTcpConnection.java:100 -
> UnknownColumnFamilyException reading from socket; closing
> org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find
> cfId=3Ddf425400-c331-11e8-8b96-4b7f4d58af68
>
> And then after LOTS of those
>
> INFO  [main] 2019-06-12 15:23:25,515 StorageService.java:1142 - JOINING:
> Starting to bootstrap...
> INFO  [main] 2019-06-12 15:23:25,525 StreamResultFuture.java:87 - [Stream
> #05af9ee0-8d26-11e9-85c1-bd5476090c54] Executing streaming plan for
> Bootstrap
> INFO  [main] 2019-06-12 15:23:25,526 StorageService.java:1199 - Bootstrap
> completed! for the tokens [-7314981925085449175, ... bunch of tokens...
> 5499447097629838103]
>
>
> On Wed, Jun 12, 2019 at 12:07 PM Carl Mueller <
> carl.mueller@smartthings.com> wrote:
>
>> One node at a time: yes that is what we are doing
>>
>> We have not tried the streaming_socket_timeout_in_ms. It is currently 24
>> hours. (```streaming_socket_timeout_in_ms=3D86400000```) which would cov=
er
>> the bootstrap timeframe we have seen before (1-2 hours per node)
>>
>> Since it joins with no data, it is serving erroneous data. We may try
>> bootstrap rejoin and the JVM_OPT.... The node appears to think it has
>> bootstrapped even though the gossipinfo shows the new node has a differe=
nt
>> schema version.
>>
>> We had scaled EU and US from 5 --> 25 without incident (one at a time),
>> and since we increased ring_delay_ms worked haphazardly to get us four
>> joins, and since then failure.
>>
>> The debug log shows:
>>
>> DEBUG [GossipStage:1] 2019-06-12 15:20:08,559 StorageService.java:1998 -
>> New node /2a05:d018:af:1108:86f4:d628:6bca:6983 at token 920028618828749=
0229
>> DEBUG [GossipStage:1] 2019-06-12 15:20:08,559 StorageService.java:1998 -
>> New node /2a05:d018:af:1108:86f4:d628:6bca:6983 at token 950856676715905=
899
>> DEBUG [GossipStage:1] 2019-06-12 15:20:08,563 MigrationManager.java:96 -
>> Not pulling schema because versions match or shouldPullSchemaFrom return=
ed
>> false
>> INFO  [GossipStage:1] 2019-06-12 15:20:08,563 TokenMetadata.java:464 -
>> Updating topology for /2a05:d018:af:1108:86f4:d628:6bca:6983
>> INFO  [GossipStage:1] 2019-06-12 15:20:08,564 TokenMetadata.java:464 -
>> Updating topology for /2a05:d018:af:1108:86f4:d628:6bca:6983
>> DEBUG [GossipStage:1] 2019-06-12 15:20:08,565 MigrationManager.java:96 -
>> Not pulling schema because versions match or shouldPullSchemaFrom return=
ed
>> false
>> INFO  [GossipStage:1] 2019-06-12 15:20:08,565 Gossiper.java:1027 - Node
>> /2600:1f18:4b4:5903:64af:955e:b65:8d83 is now part of the cluster
>> DEBUG [GossipStage:1] 2019-06-12 15:20:08,587 StorageService.java:1928 -
>> Node /2600:1f18:4b4:5903:64af:955e:b65:8d83 state NORMAL, token
>> [-1028768087263234868, ...<bunch of tokens>..., 921670352349030554]
>> DEBUG [GossipStage:1] 2019-06-12 15:20:08,588 StorageService.java:1998 -
>> New node /2600:1f18:4b4:5903:64af:955e:b65:8d83 at token
>> -1028768087263234868
>> DEBUG [GossipStage:1] 2019-06-12 15:20:08,588 StorageService.java:1998 -
>> New node /2600:1f18:4b4:5903:64af:955e:b65:8d83 at token
>> -1045740236536355596
>> DEBUG [GossipStage:1] 2019-06-12 15:20:08,589 StorageService.java:1998 -
>> New node /2600:1f18:4b4:5903:64af:955e:b65:8d83 at token
>> -1184422937682103096
>> DEBUG [GossipStage:1] 2019-06-12 15:20:08,589 StorageService.java:1998 -
>> New node /2600:1f18:4b4:5903:64af:955e:b65:8d83 at token
>> -1201924032068728250
>>
>> All the nodes appear to be reporting "Not pulling schema becuase version=
s
>> match or shouldPullSchmeaFrom returned false. That code
>> (MigrationManager.java) makes reference to a "gossip only" node, did we =
get
>> stuck in that somehow.
>>
>> On Wed, Jun 12, 2019 at 11:45 AM ZAIDI, ASAD A <az192g@att.com> wrote:
>>
>>>
>>>
>>>
>>>
>>> Adding one node at a time =E2=80=93 is that successful?
>>>
>>> Check value of streaming_socket_timeout_in_ms parameter in
>>> cassandra.yaml and increase if needed.
>>>
>>> Have you tried Nodetool bootstrap resume & jvm option i.e.
>>> JVM_OPTS=3D"$JVM_OPTS -Dcassandra.consistent.rangemovement=3Dfalse"  ?
>>>
>>>
>>>
>>>
>>>
>>> *From:* Carl Mueller [mailto:carl.mueller@smartthings.com.INVALID]
>>> *Sent:* Wednesday, June 12, 2019 11:35 AM
>>> *To:* user@cassandra.apache.org
>>> *Subject:* Re: postmortem on 2.2.13 scale out difficulties
>>>
>>>
>>>
>>> We only were able to scale out four nodes and then failures started
>>> occurring, including multiple instances of nodes joining a cluster with=
out
>>> streaming.
>>>
>>>
>>>
>>> Sigh.
>>>
>>>
>>>
>>> On Tue, Jun 11, 2019 at 3:11 PM Carl Mueller <
>>> carl.mueller@smartthings.com> wrote:
>>>
>>> We had a three-DC (asia-tokyo/europe/us) cassandra 2.2.13 cluster, AWS,
>>> IPV6
>>>
>>> Needed to scale out the asia datacenter, which was 5 nodes, europe and
>>> us were 25 nodes
>>>
>>> We were running into bootstrapping issues where the new node failed to
>>> bootstrap/stream, it failed with
>>>
>>>
>>>
>>> "java.lang.RuntimeException: A node required to move the data
>>> consistently is down"
>>>
>>>
>>>
>>> ...even though they were all up based on nodetool status prior to addin=
g
>>> the node.
>>>
>>> First we increased the phi_convict_threshold to 12, and that did not
>>> help.
>>>
>>> CASSANDRA-12281 appeared similar to what we had problems with, but I
>>> don't think we hit that. Somewhere in there someone wrote
>>>
>>>
>>>
>>> "For us, the workaround is either deleting the data (then bootstrap
>>> again), or increasing the ring_delay_ms. And the larger the cluster is,=
 the
>>> longer ring_delay_ms is needed. Based on our tests, for a 40 nodes clus=
ter,
>>> it requires ring_delay_ms to be >50seconds. For a 70 nodes cluster,
>>> >100seconds. Default is 30seconds."
>>>
>>> Given the WAN nature or our DCs, we used ring_delay_ms to 100 seconds
>>> and it finally worked.
>>>
>>> side note:
>>>
>>> During the rolling restarts for setting phi_convict_threshold we
>>> observed quite a lot of status map variance between nodes (we have a
>>> program to poll all of a datacenter or cluster's view of the gossipinfo=
 and
>>> statuses. AWS appears to have variance in networking based on the
>>> phi_convict_threshold advice, I'm not sure if our difficulties were typ=
ical
>>> in that regard and/or if our IPV6 and/or globally distributed datacente=
rs
>>> were exacerbating factors.
>>>
>>> We could not reproduce this in loadtest, although loadtest is only eu
>>> and us (but is IPV6)
>>>
>>>

--000000000000801661058b26dacd
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">I posted a bug, cassandra-15155 :=C2=A0<a href=3D"https://=
issues.apache.org/jira/browse/CASSANDRA-15155?jql=3Dproject%20%3D%20CASSAND=
RA">https://issues.apache.org/jira/browse/CASSANDRA-15155?jql=3Dproject%20%=
3D%20CASSANDRA</a><br><br>It seems VERY similar to=C2=A0<a href=3D"https://=
issues.apache.org/jira/browse/CASSANDRA-6648" class=3D"external-link" rel=
=3D"nofollow" style=3D"color:rgb(59,115,175);font-family:Arial,sans-serif;f=
ont-size:14px;background-color:rgb(245,245,245)">https://issues.apache.org/=
jira/browse/CASSANDRA-6648</a></div><div dir=3D"ltr"></div><br><div class=
=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Wed, Jun 12, 2019=
 at 12:14 PM Carl Mueller &lt;<a href=3D"mailto:carl.mueller@smartthings.co=
m" target=3D"_blank">carl.mueller@smartthings.com</a>&gt; wrote:<br></div><=
blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-l=
eft:1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr">And once =
the cluster token map formation is done, it starts bootstrap and we get a t=
on of these:<br><br>WARN =C2=A0[MessagingService-Incoming-/2406:da14:95b:45=
03:910e:23fd:dafa:9983] 2019-06-12 15:22:04,760 IncomingTcpConnection.java:=
100 - UnknownColumnFamilyException reading from socket; closing<br>org.apac=
he.cassandra.db.UnknownColumnFamilyException: Couldn&#39;t find cfId=3Ddf42=
5400-c331-11e8-8b96-4b7f4d58af68<br><br>And then after LOTS of those<div><b=
r></div><div>INFO =C2=A0[main] 2019-06-12 15:23:25,515 StorageService.java:=
1142 - JOINING: Starting to bootstrap...<br>INFO =C2=A0[main] 2019-06-12 15=
:23:25,525 StreamResultFuture.java:87 - [Stream #05af9ee0-8d26-11e9-85c1-bd=
5476090c54] Executing streaming plan for Bootstrap<br>INFO =C2=A0[main] 201=
9-06-12 15:23:25,526 StorageService.java:1199 - Bootstrap completed! for th=
e tokens [-7314981925085449175, ... bunch of tokens... 5499447097629838103]=
<br></div><div><br></div></div><br><div class=3D"gmail_quote"><div dir=3D"l=
tr" class=3D"gmail_attr">On Wed, Jun 12, 2019 at 12:07 PM Carl Mueller &lt;=
<a href=3D"mailto:carl.mueller@smartthings.com" target=3D"_blank">carl.muel=
ler@smartthings.com</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quot=
e" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204)=
;padding-left:1ex"><div dir=3D"ltr">One node at a time: yes that is what we=
 are doing<br><br>We have not tried the streaming_socket_timeout_in_ms. It =
is currently 24 hours. (```streaming_socket_timeout_in_ms=3D86400000```) wh=
ich would cover the bootstrap timeframe we have seen before (1-2 hours per =
node)<br><br>Since it joins with no data, it is serving erroneous data. We =
may try bootstrap rejoin and the JVM_OPT.... The node appears to think it h=
as bootstrapped even though the gossipinfo shows the new node has a differe=
nt schema version.=C2=A0<br><br>We had scaled EU and US from 5 --&gt; 25 wi=
thout incident (one at a time), and since we increased ring_delay_ms worked=
 haphazardly to get us four joins, and since then failure.=C2=A0<br><br>The=
 debug log shows:<br><br>DEBUG [GossipStage:1] 2019-06-12 15:20:08,559 Stor=
ageService.java:1998 - New node /2a05:d018:af:1108:86f4:d628:6bca:6983 at t=
oken 9200286188287490229<br>DEBUG [GossipStage:1] 2019-06-12 15:20:08,559 S=
torageService.java:1998 - New node /2a05:d018:af:1108:86f4:d628:6bca:6983 a=
t token 950856676715905899<br>DEBUG [GossipStage:1] 2019-06-12 15:20:08,563=
 MigrationManager.java:96 - Not pulling schema because versions match or sh=
ouldPullSchemaFrom returned false<br>INFO =C2=A0[GossipStage:1] 2019-06-12 =
15:20:08,563 TokenMetadata.java:464 - Updating topology for /2a05:d018:af:1=
108:86f4:d628:6bca:6983<br>INFO =C2=A0[GossipStage:1] 2019-06-12 15:20:08,5=
64 TokenMetadata.java:464 - Updating topology for /2a05:d018:af:1108:86f4:d=
628:6bca:6983<br>DEBUG [GossipStage:1] 2019-06-12 15:20:08,565 MigrationMan=
ager.java:96 - Not pulling schema because versions match or shouldPullSchem=
aFrom returned false<br>INFO =C2=A0[GossipStage:1] 2019-06-12 15:20:08,565 =
Gossiper.java:1027 - Node /2600:1f18:4b4:5903:64af:955e:b65:8d83 is now par=
t of the cluster<br>DEBUG [GossipStage:1] 2019-06-12 15:20:08,587 StorageSe=
rvice.java:1928 - Node /2600:1f18:4b4:5903:64af:955e:b65:8d83 state NORMAL,=
 token [-1028768087263234868, ...&lt;bunch of tokens&gt;..., 92167035234903=
0554]<br>DEBUG [GossipStage:1] 2019-06-12 15:20:08,588 StorageService.java:=
1998 - New node /2600:1f18:4b4:5903:64af:955e:b65:8d83 at token -1028768087=
263234868<br>DEBUG [GossipStage:1] 2019-06-12 15:20:08,588 StorageService.j=
ava:1998 - New node /2600:1f18:4b4:5903:64af:955e:b65:8d83 at token -104574=
0236536355596<br>DEBUG [GossipStage:1] 2019-06-12 15:20:08,589 StorageServi=
ce.java:1998 - New node /2600:1f18:4b4:5903:64af:955e:b65:8d83 at token -11=
84422937682103096<br>DEBUG [GossipStage:1] 2019-06-12 15:20:08,589 StorageS=
ervice.java:1998 - New node /2600:1f18:4b4:5903:64af:955e:b65:8d83 at token=
 -1201924032068728250<br><br>All the nodes appear to be reporting &quot;Not=
 pulling schema becuase versions match or shouldPullSchmeaFrom returned fal=
se. That code (MigrationManager.java) makes reference to a &quot;gossip onl=
y&quot; node, did we get stuck in that somehow.=C2=A0</div><br><div class=
=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Wed, Jun 12, 2019=
 at 11:45 AM ZAIDI, ASAD A &lt;<a href=3D"mailto:az192g@att.com" target=3D"=
_blank">az192g@att.com</a>&gt; wrote:<br></div><blockquote class=3D"gmail_q=
uote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,2=
04);padding-left:1ex">


<div lang=3D"EN-US">
<div class=3D"gmail-m_-5497178988073408612gmail-m_3325075381966217791gmail-=
m_8325322095059535369gmail-m_2235465717665821861WordSection1">
<p class=3D"MsoNormal"><span style=3D"font-size:11pt;font-family:Calibri,sa=
ns-serif;color:rgb(31,73,125)"><u></u>=C2=A0<u></u></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11pt;font-family:Calibri,sa=
ns-serif;color:rgb(31,73,125)"><u></u>=C2=A0<u></u></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11pt;font-family:Calibri,sa=
ns-serif;color:rgb(31,73,125)">Adding one node at a time =E2=80=93 is that =
successful?<u></u><u></u></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11pt;font-family:Calibri,sa=
ns-serif;color:rgb(31,73,125)">Check value of streaming_socket_timeout_in_m=
s parameter in cassandra.yaml and increase if needed.<u></u><u></u></span><=
/p>
<p class=3D"MsoNormal"><span style=3D"font-size:11pt;font-family:Calibri,sa=
ns-serif;color:rgb(31,73,125)">Have you tried Nodetool bootstrap resume &am=
p; jvm option i.e. JVM_OPTS=3D&quot;$JVM_OPTS -Dcassandra.consistent.rangem=
ovement=3Dfalse&quot;=C2=A0 ?<u></u><u></u></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11pt;font-family:Calibri,sa=
ns-serif;color:rgb(31,73,125)"><u></u>=C2=A0<u></u></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11pt;font-family:Calibri,sa=
ns-serif;color:rgb(31,73,125)"><u></u>=C2=A0<u></u></span></p>
<p class=3D"MsoNormal"><b><span style=3D"font-size:11pt;font-family:Calibri=
,sans-serif">From:</span></b><span style=3D"font-size:11pt;font-family:Cali=
bri,sans-serif"> Carl Mueller [mailto:<a href=3D"mailto:carl.mueller@smartt=
hings.com.INVALID" target=3D"_blank">carl.mueller@smartthings.com.INVALID</=
a>]
<br>
<b>Sent:</b> Wednesday, June 12, 2019 11:35 AM<br>
<b>To:</b> <a href=3D"mailto:user@cassandra.apache.org" target=3D"_blank">u=
ser@cassandra.apache.org</a><br>
<b>Subject:</b> Re: postmortem on 2.2.13 scale out difficulties<u></u><u></=
u></span></p>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
<div>
<p class=3D"MsoNormal">We only were able to scale out four nodes and then f=
ailures started occurring, including multiple instances of nodes joining a =
cluster without streaming.=C2=A0<u></u><u></u></p>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">Sigh.<u></u><u></u></p>
</div>
</div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
<div>
<div>
<p class=3D"MsoNormal">On Tue, Jun 11, 2019 at 3:11 PM Carl Mueller &lt;<a =
href=3D"mailto:carl.mueller@smartthings.com" target=3D"_blank">carl.mueller=
@smartthings.com</a>&gt; wrote:<u></u><u></u></p>
</div>
<blockquote style=3D"border-top:none;border-right:none;border-bottom:none;b=
order-left:1pt solid rgb(204,204,204);padding:0in 0in 0in 6pt;margin-left:4=
.8pt;margin-right:0in">
<div>
<p class=3D"MsoNormal">We had a three-DC (asia-tokyo/europe/us) cassandra 2=
.2.13 cluster, AWS, IPV6<br>
<br>
Needed to scale out the asia datacenter, which was 5 nodes, europe and us w=
ere 25 nodes<br>
<br>
We were running into bootstrapping issues where the new node failed to boot=
strap/stream, it failed with=C2=A0<u></u><u></u></p>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">&quot;java.lang.RuntimeException: A node required to=
 move the data consistently is down&quot;=C2=A0<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">...even though they were all up based on nodetool st=
atus prior to adding the node.<br>
<br>
First we increased the phi_convict_threshold to 12, and that did not help.=
=C2=A0<br>
<br>
CASSANDRA-12281 appeared similar to what we had problems with, but I don=
9;t think we hit that. Somewhere in there someone wrote<u></u><u></u></p>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal" style=3D"margin-bottom:12pt">&quot;For us, the worka=
round is either deleting the data (then bootstrap again), or increasing the=
 ring_delay_ms. And the larger the cluster is, the longer ring_delay_ms is =
needed. Based on our tests, for a 40 nodes
 cluster, it requires ring_delay_ms to be &gt;50seconds. For a 70 nodes clu=
ster, &gt;100seconds. Default is 30seconds.&quot;<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">Given the WAN nature or our DCs, we used ring_delay_=
ms to 100 seconds and it finally worked.<br>
<br>
side note:<br>
<br>
During the rolling restarts for setting phi_convict_threshold we observed q=
uite a lot of status map variance between nodes (we have a program to poll =
all of a datacenter or cluster&#39;s view of the gossipinfo and statuses. A=
WS appears to have variance in networking
 based on the phi_convict_threshold advice, I&#39;m not sure if our difficu=
lties were typical in that regard and/or if our IPV6 and/or globally distri=
buted datacenters were exacerbating factors.=C2=A0=C2=A0<br>
<br>
We could not reproduce this in loadtest, although loadtest is only eu and u=
s (but is IPV6)<u></u><u></u></p>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>

</blockquote></div>
</blockquote></div>
</blockquote></div>

--000000000000801661058b26dacd--