From user-return-59890-archive-asf-public=cust-asf.ponee.io@cassandra.apache.org  Tue Feb 20 21:39:47 2018
Return-Path: <user-return-59890-archive-asf-public=cust-asf.ponee.io@cassandra.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
	by mx-eu-01.ponee.io (Postfix) with SMTP id 8990D180654
	for <archive-asf-public@cust-asf.ponee.io>; Tue, 20 Feb 2018 21:39:46 +0100 (CET)
Received: (qmail 13170 invoked by uid 500); 20 Feb 2018 20:39:44 -0000
Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:user-help@cassandra.apache.org>
List-Unsubscribe: <mailto:user-unsubscribe@cassandra.apache.org>
List-Post: <mailto:user@cassandra.apache.org>
List-Id: <user.cassandra.apache.org>
Reply-To: user@cassandra.apache.org
Delivered-To: mailing list user@cassandra.apache.org
Received: (qmail 13160 invoked by uid 99); 20 Feb 2018 20:39:44 -0000
Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142)
    by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Feb 2018 20:39:44 +0000
Received: from localhost (localhost [127.0.0.1])
	by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 1CC3CC1936
	for <user@cassandra.apache.org>; Tue, 20 Feb 2018 20:39:44 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org
X-Spam-Flag: NO
X-Spam-Score: 1.898
X-Spam-Level: *
X-Spam-Status: No, score=1.898 tagged_above=-999 required=6.31
	tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1,
	HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001,
	SPF_PASS=-0.001] autolearn=disabled
Authentication-Results: spamd1-us-west.apache.org (amavisd-new);
	dkim=pass (2048-bit key) header.d=gmail.com
Received: from mx1-lw-us.apache.org ([10.40.0.8])
	by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024)
	with ESMTP id kahPGTOZl32y for <user@cassandra.apache.org>;
	Tue, 20 Feb 2018 20:39:43 +0000 (UTC)
Received: from mail-wr0-f174.google.com (mail-wr0-f174.google.com [209.85.128.174])
	by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 96F505F340
	for <user@cassandra.apache.org>; Tue, 20 Feb 2018 20:39:42 +0000 (UTC)
Received: by mail-wr0-f174.google.com with SMTP id w77so18104473wrc.6
        for <user@cassandra.apache.org>; Tue, 20 Feb 2018 12:39:42 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to;
        bh=vA7bOyXHA7eV06AiNXSxxBtz5998uOMcLGOeWef8aLo=;
        b=Q5n/dD1xX1iRPp6bzUIHMFqPvEMVKL1BzNDxXU/XxhxnJmphYeShdwTLKV5Ta6RN1f
         Nebek2eXjW4AtwlhOF20YLyVj7RP7U/XsMKkzAcYFGND9ESJpYVeQRHJPw0K/NZ8vdHM
         bcY7nnbeEYp0oh1ZLp9COSXb2etAaXQH1oA7XkGHKzJ85mQxEK8nqwsZ48W5YVfql5ep
         uJsGoiqOTlgc33mA1Pi7sj//SRm5121rossNUS3pKAEShRzoGnFE7p22mwouO5d1+rKU
         308kwFRfi/Qh/bBILVOxvuGZPZkqDfLNgnJMcluPlF9tzs8isWi2US0AxrnTYH4yZ64P
         eYig==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:in-reply-to:references:from:date
         :message-id:subject:to;
        bh=vA7bOyXHA7eV06AiNXSxxBtz5998uOMcLGOeWef8aLo=;
        b=e5DGW2chvqDGzkUq0CQ/pKxaCsf0HBeA9h8suhQqJuhLgbLZRsOP1411kGfLGagDZz
         KkIvlZbeCwa/0IgH1b1E40qWOzN/ugCIX8Qti6XKAu4iCXcU2BBsAQz9iOF2C74KUVXp
         /Fnk1fjzib1+kH2owO5z1SOvKqCBghZLCKGTfwy8fEHwX6ZYc0SDdy+8AaPzDIfoK8pq
         JCdMc3yiyMv71GfJCymjMm/feMP6Godg5F4liKbTUzxvu3b3FeVEzH5eK2oz7etEdLMK
         9MSEmhihXwT5SXXxiu7Tfbgsixc4X5Rnu3o4iCIXPV+ba+zt5BUiJ6p7//ZYopk3ECeW
         dsIQ==
X-Gm-Message-State: APf1xPCnLjOxLwPUe00MUMsg7G7LYNKjDiDA65fR8S/6UhHUa4HaQct4
	0JCG+qvHLCNZNR12gk8sGQNipNYZFNiKer3u83mwmA==
X-Google-Smtp-Source: AH8x227SVIAcyYDee9ZeahERyK0ogI1MBSREILoY7HwTrJeWs1Zd0e9KQW+FOhIzqXnDAii+nSol08jdiEUhvUje+oY=
X-Received: by 10.80.244.250 with SMTP id v55mr1815720edm.221.1519159181505;
 Tue, 20 Feb 2018 12:39:41 -0800 (PST)
MIME-Version: 1.0
Received: by 10.80.138.99 with HTTP; Tue, 20 Feb 2018 12:39:21 -0800 (PST)
In-Reply-To: <41268749-9B74-4124-8582-A920293ACB27@gmail.com>
References: <CAB0qMNhFekwVXdAaSoxkwQthhi7YAZPHc=s4c2kMNBHhbCk5YQ@mail.gmail.com>
 <CA+EmchktLsFcm11_NL3_ZpWVGF_F9yv0o4ceQS8HGqG_UJH76g@mail.gmail.com> <41268749-9B74-4124-8582-A920293ACB27@gmail.com>
From: Jeff Jirsa <jjirsa@gmail.com>
Date: Tue, 20 Feb 2018 12:39:21 -0800
Message-ID: <CA+Emchnq1bRggLiY7ziK355b+_hr-BKje7LDqzj-Y4G29c31rA@mail.gmail.com>
Subject: Re: Is it possible / makes it sense to limit concurrent streaming
 during bootstrapping new nodes?
To: cassandra <user@cassandra.apache.org>
Content-Type: multipart/alternative; boundary="94eb2c09543a5efa9c0565aacfb5"

--94eb2c09543a5efa9c0565aacfb5
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

At a past job, we set the limit at around 60 hosts per cluster - anything
bigger than that got single token. Anything smaller, and we'd just tolerate
the inconveniences of vnodes. But that was before the new vnode token
allocation went into 3.0, and really assumed things that may not be true
for you (it was a cluster that started at 60 hosts and grew up to 480 in
steps, so we'd want to grow quickly - having single token allowed us to
grow from 60-120 in 2 days, and then 120-180 in 2 days, and so on).

Are you always going to be growing, or is it a short/temporary thing?
There are users of vnodes (at big, public companies) that go up into the
hundreds of nodes.

Most people running cassandra start sharding clusters rather than going
past a thousand or so nodes - I know there's at least one person I talked
to in IRC with a 1700 host cluster, but that'd be beyond what I'd ever do
personally.


On Tue, Feb 20, 2018 at 12:34 PM, J=C3=BCrgen Albersdorfer <
jalbersdorfer@gmail.com> wrote:

> Thanks Jeff,
> your answer is really not what I expected to learn - which is again more
> manual doing as soon as we start really using C*. But I=E2=80=98m happy t=
o be able
> to learn it now and have still time to learn the neccessary Skills and as=
k
> the right questions on how to correctly drive big data with C* until we
> actually start using it, and I=E2=80=98m glad to have People like you aro=
und caring
> about this questions. Thanks. This still convinces me having bet on the
> right horse, even when it might become a rough ride.
>
> By the way, is it possible to migrate towards to smaller token ranges?
> What is the recommended way doing so? And which number of nodes is the
> typical =E2=80=9Abreak even=E2=80=98?
>
> Von meinem iPhone gesendet
>
> Am 20.02.2018 um 21:05 schrieb Jeff Jirsa <jjirsa@gmail.com>:
>
> The scenario you describe is the typical point where people move away fro=
m
> vnodes and towards single-token-per-node (or a much smaller number of
> vnodes).
>
> The default setting puts you in a situation where virtually all hosts are
> adjacent/neighbors to all others (at least until you're way into the
> hundreds of hosts), which means you'll stream from nearly all hosts. If y=
ou
> drop the number of vnodes from ~256 to ~4 or ~8 or ~16, you'll see the
> number of streams drop as well.
>
> Many people with "large" clusters statically allocate tokens to make it
> predictable - if you have a single token per host, you can add multiple
> hosts at a time, each streaming from a small number of neighbors, without
> overlap.
>
> It takes a bit more tooling (or manual token calculation) outside of
> cassandra, but works well in practice for "large" clusters.
>
>
>
>
> On Tue, Feb 20, 2018 at 4:42 AM, J=C3=BCrgen Albersdorfer <
> jalbersdorfer@gmail.com> wrote:
>
>> Hi, I'm wondering if it is possible resp. would it make sense to limit
>> concurrent streaming when joining a new node to cluster.
>>
>> I'm currently operating a 15-Node C* Cluster (V 3.11.1) and joining
>> another Node every day.
>> The 'nodetool netstats' shows it always streams data from all other node=
s.
>>
>> How far will this scale? - What happens when I have hundrets or even
>> thousends of Nodes?
>>
>> Has anyone experience with such a Situation?
>>
>> Thanks, and regards
>> J=C3=BCrgen
>>
>
>

--94eb2c09543a5efa9c0565aacfb5
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">At a past job, we set the limit at around 60 hosts per clu=
ster - anything bigger than that got single token. Anything smaller, and we=
&#39;d just tolerate the inconveniences of vnodes. But that was before the =
new vnode token allocation went into 3.0, and really assumed things that ma=
y not be true for you (it was a cluster that started at 60 hosts and grew u=
p to 480 in steps, so we&#39;d want to grow quickly - having single token a=
llowed us to grow from 60-120 in 2 days, and then 120-180 in 2 days, and so=
 on).<div><br></div><div>Are you always going to be growing, or is it a sho=
rt/temporary thing?<br><div>There are users of vnodes (at big, public compa=
nies) that go up into the hundreds of nodes.</div><div><br></div><div>Most =
people running cassandra start sharding clusters rather than going past a t=
housand or so nodes - I know there&#39;s at least one person I talked to in=
 IRC with a 1700 host cluster, but that&#39;d be beyond what I&#39;d ever d=
o personally.</div><div><br></div><div><br></div></div></div><div class=3D"=
gmail_extra"><br><div class=3D"gmail_quote">On Tue, Feb 20, 2018 at 12:34 P=
M, J=C3=BCrgen Albersdorfer <span dir=3D"ltr">&lt;<a href=3D"mailto:jalbers=
dorfer@gmail.com" target=3D"_blank">jalbersdorfer@gmail.com</a>&gt;</span> =
wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;bord=
er-left:1px #ccc solid;padding-left:1ex"><div dir=3D"auto">Thanks Jeff,<div=
>your answer is really not what I expected to learn - which is again more m=
anual doing as soon as we start really using C*. But I=E2=80=98m happy to b=
e able to learn it now and have still time to learn the neccessary Skills a=
nd ask the right questions on how to correctly drive big data with C* until=
 we actually start using it, and I=E2=80=98m glad to have People like you a=
round caring about this questions. Thanks. This still convinces me having b=
et on the right horse, even when it might become a rough ride.</div><div><b=
r></div><div>By the way, is it possible to migrate towards to smaller token=
 ranges? What is the recommended way doing so? And which number of nodes is=
 the typical =E2=80=9Abreak even=E2=80=98?<br><br><div id=3D"m_541256524929=
1120962AppleMailSignature">Von meinem iPhone gesendet</div><div><div class=
=3D"h5"><div><br>Am 20.02.2018 um 21:05 schrieb Jeff Jirsa &lt;<a href=3D"m=
ailto:jjirsa@gmail.com" target=3D"_blank">jjirsa@gmail.com</a>&gt;:<br><br>=
</div><blockquote type=3D"cite"><div><div dir=3D"ltr">The scenario you desc=
ribe is the typical point where people move away from vnodes and towards si=
ngle-token-per-node (or a much smaller number of vnodes).<div><br></div><di=
v>The default setting puts you in a situation where virtually all hosts are=
 adjacent/neighbors to all others (at least until you&#39;re way into the h=
undreds of hosts), which means you&#39;ll stream from nearly all hosts. If =
you drop the number of vnodes from ~256 to ~4 or ~8 or ~16, you&#39;ll see =
the number of streams drop as well.</div><div><br></div><div>Many people wi=
th &quot;large&quot; clusters statically allocate tokens to make it predict=
able - if you have a single token per host, you can add multiple hosts at a=
 time, each streaming from a small number of neighbors, without overlap.</d=
iv><div><br></div><div>It takes a bit more tooling (or manual token calcula=
tion) outside of cassandra, but works well in practice for &quot;large&quot=
; clusters.</div><div><br></div><div><br></div><div><br></div></div><div cl=
ass=3D"gmail_extra"><br><div class=3D"gmail_quote">On Tue, Feb 20, 2018 at =
4:42 AM, J=C3=BCrgen Albersdorfer <span dir=3D"ltr">&lt;<a href=3D"mailto:j=
albersdorfer@gmail.com" target=3D"_blank">jalbersdorfer@gmail.com</a>&gt;</=
span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8e=
x;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr">Hi, I&#39;m=
 wondering if it is possible resp. would it make sense to limit concurrent =
streaming when joining a new node to cluster.<div><br></div><div>I&#39;m cu=
rrently operating a 15-Node C* Cluster (V 3.11.1) and joining another Node =
every day.</div><div>The &#39;nodetool netstats&#39; shows it always stream=
s data from all other nodes.</div><div><br></div><div>How far will this sca=
le? - What happens when I have hundrets or even thousends of Nodes?</div><d=
iv><br></div><div>Has anyone experience with such a Situation?</div><div><b=
r></div><div>Thanks, and regards</div><span class=3D"m_5412565249291120962H=
OEnZb"><font color=3D"#888888"><div>J=C3=BCrgen</div></font></span></div>
</blockquote></div><br></div>
</div></blockquote></div></div></div></div></blockquote></div><br></div>

--94eb2c09543a5efa9c0565aacfb5--