From user-return-65103-archive-asf-public=cust-asf.ponee.io@cassandra.apache.org Mon Feb 3 02:36:26 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 9690D1802C7 for ; Mon, 3 Feb 2020 03:36:25 +0100 (CET) Received: (qmail 51988 invoked by uid 500); 3 Feb 2020 02:36:21 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 51977 invoked by uid 99); 3 Feb 2020 02:36:21 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Feb 2020 02:36:21 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 904741A321A for ; Mon, 3 Feb 2020 02:36:20 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0 X-Spam-Level: X-Spam-Status: No, score=0 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-ec2-va.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id Dln06lOUSX6Q for ; Mon, 3 Feb 2020 02:36:18 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.208.169; helo=mail-lj1-f169.google.com; envelope-from=anthony.grasso@gmail.com; receiver= Received: from mail-lj1-f169.google.com (mail-lj1-f169.google.com [209.85.208.169]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id 31076BB802 for ; Mon, 3 Feb 2020 02:36:18 +0000 (UTC) Received: by mail-lj1-f169.google.com with SMTP id q8so12939418ljb.2 for ; Sun, 02 Feb 2020 18:36:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=Gu+erXuf563xpvo9EnHMI49oZArSsH88GN4BDm3E4PA=; b=Oy+BLkoqQuWxyp7/G/XO9yxeJfIVMzBnv4cKeUGwvYlRcvXP+nHEYDukSXJqRrjUn3 YG/xnFlE9jVpnfjAgy+sHXo5zgSJTdevhvksiWg2MI8jQBucIA2ZApgVLqMBIJCMFPD2 oJmapkgy+1IsabvbTtpEySum2w+fbSARdQfSsD7p/26jJEdJpRm969tNtl+uWVkgRSpS gBEQYhidkcjfLNLF0mSxDPZxxPQyNENJASnJa9OxFSTkGY3EgcysO6wCDUzhXpelfyFY ju4jKdaABNuXQNeDzKiJkd7gY0siIWx8DToqSEgdkYSDJAZBe0kQXSFdrrJmgHU2uw4L 9FDg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=Gu+erXuf563xpvo9EnHMI49oZArSsH88GN4BDm3E4PA=; b=GWlJudx+ekVoulKhT4YuCUVvV1G9H4C4WNKFSZBrh1R5eQCW2myahUj4h7U//fVsa5 mtajITiIfqXzr2YSi034LqqRKrp5Dknvbu+UrKH6W5o8uycClsa9NGqc7DXzLty0+9Tb ZRmrJhwU9d6/zWkWVj9ANszBBW3JBej4XHPVEbUXr13H0YCKMIb1DASH/gbaboqFxKP0 eDXyjULk1wQ7ftmIk0pV3x9c6RCDoOew5pZzOrgjlKlMktY0ZNvVHQefxxoZwrBLH0t1 KAYFxc37yB8a9eMsa9SGY7BP8ZbAx2Xd+QdeeE8ECxrIaOx/PqpEJTecaOI2SMU6VfpX kabg== X-Gm-Message-State: APjAAAXAi1/JA0vj8wbgqnRZPaZ2bcXIt6jRttHRQ7AHik7/0SLr/GK+ X1f5DsV2/2rVeAGmB07tbzlqsWUFT6j4O40CEsOS7eZ5 X-Google-Smtp-Source: APXvYqzca1FwsclGfyEnLejh4/N6qnR/psedrGt7PdV7q9upybiZk/puk1TmuVHNWlFzw/nz+uT3PDS/Qbc/cm7kSsQ= X-Received: by 2002:a2e:9d89:: with SMTP id c9mr12870326ljj.212.1580697376171; Sun, 02 Feb 2020 18:36:16 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Anthony Grasso Date: Mon, 3 Feb 2020 13:35:40 +1100 Message-ID: Subject: Re: [EXTERNAL] How to reduce vnodes without downtime To: user Content-Type: multipart/alternative; boundary="0000000000009aac56059da2c99e" --0000000000009aac56059da2c99e Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Sergio, There is a misunderstanding here. My post makes no recommendation for the value of num_tokens. Rather, it focuses on how to use the allocate_tokens_for_keyspace setting when creating a new cluster. Whilst a value of 4 is used for num_tokens in the post, it was chosen for demonstration purposes. Specifically it makes: - the uneven token distribution in a small cluster very obvious, - identifying the endpoints displayed in nodetool ring easy, and - the initial_token setup less verbose and easier to follow. I will add an editorial note to the post with the above information so there is no confusion about why 4 tokens were used. I would only consider moving a cluster to 4 tokens if it is larger than 100 nodes. If you read through the paper that Erick mentioned, written by Joe Lynch & Josh Snyder, they show that the num_tokens impacts the availability of large scale clusters. If you are after more details about the trade-offs between different sized token values, please see the discussion on the dev mailing list: "[Discuss] num_tokens default in Cassandra 4.0 ". Regards, Anthony On Sat, 1 Feb 2020 at 10:07, Sergio wrote: > > https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-toke= n-distribution.html This > is the article with 4 token recommendations. > @Erick Ramirez. which is the dev thread for the default 32 tokens > recommendation? > > Thanks, > Sergio > > Il giorno ven 31 gen 2020 alle ore 14:49 Erick Ramirez < > flightctlr@gmail.com> ha scritto: > >> There's an active discussion going on right now in a separate dev thread= . >> The current "default recommendation" is 32 tokens. But there's a push fo= r 4 >> in combination with allocate_tokens_for_keyspace from Jon Haddad & co >> (based on a paper from Joe Lynch & Josh Snyder). >> >> If you're satisfied with the results from your own testing, go with 4 >> tokens. And that's the key -- you must test, test, TEST! Cheers! >> >> On Sat, Feb 1, 2020 at 5:17 AM Arvinder Dhillon >> wrote: >> >>> What is recommended vnodes now? I read 8 in later cassandra 3.x >>> Is the new recommendation 4 now even in version 3.x (asking for 3.11)? >>> Thanks >>> >>> On Fri, Jan 31, 2020 at 9:49 AM Durity, Sean R < >>> SEAN_R_DURITY@homedepot.com> wrote: >>> >>>> These are good clarifications and expansions. >>>> >>>> >>>> >>>> Sean Durity >>>> >>>> >>>> >>>> *From:* Anthony Grasso >>>> *Sent:* Thursday, January 30, 2020 7:25 PM >>>> *To:* user >>>> *Subject:* Re: [EXTERNAL] How to reduce vnodes without downtime >>>> >>>> >>>> >>>> Hi Maxim, >>>> >>>> >>>> >>>> Basically what Sean suggested is the way to do this without downtime. >>>> >>>> >>>> >>>> To clarify the, the *three* steps following the "Decommission each >>>> node in the DC you are working on" step should be applied to *only* >>>> the decommissioned nodes. So where it say "*all nodes*" or "*every >>>> node*" it applies to only the decommissioned nodes. >>>> >>>> >>>> >>>> In addition, the step that says "Wipe data on all the nodes", I would >>>> delete all files in the following directories on the decommissioned no= des. >>>> >>>> - data (usually located in /var/lib/cassandra/data) >>>> - commitlogs (usually located in /var/lib/cassandra/commitlogs) >>>> - hints (usually located in /var/lib/casandra/hints) >>>> - saved_caches (usually located in /var/lib/cassandra/saved_caches) >>>> >>>> >>>> >>>> Cheers, >>>> >>>> Anthony >>>> >>>> >>>> >>>> On Fri, 31 Jan 2020 at 03:05, Durity, Sean R < >>>> SEAN_R_DURITY@homedepot.com> wrote: >>>> >>>> Your procedure won=E2=80=99t work very well. On the first node, if you= switched >>>> to 4, you would end up with only a tiny fraction of the data (because = the >>>> other nodes would still be at 256). I updated a large cluster (over 15= 0 >>>> nodes =E2=80=93 2 DCs) to smaller number of vnodes. The basic outline = was this: >>>> >>>> >>>> >>>> - Stop all repairs >>>> - Make sure the app is running against one DC only >>>> - Change the replication settings on keyspaces to use only 1 DC >>>> (basically cutting off the other DC) >>>> - Decommission each node in the DC you are working on. Because the >>>> replication setting are changed, no streaming occurs. But it releas= es the >>>> token assignments >>>> - Wipe data on all the nodes >>>> - Update configuration on every node to your new settings, >>>> including auto_bootstrap =3D false >>>> - Start all nodes. They will choose tokens, but not stream any data >>>> - Update replication factor for all keyspaces to include the new DC >>>> - I disabled binary on those nodes to prevent app connections >>>> - Run nodetool reduild with -dc (other DC) on as many nodes as your >>>> system can safely handle until they are all rebuilt. >>>> - Re-enable binary (and app connections to the rebuilt DC) >>>> - Turn on repairs >>>> - Rest for a bit, then reverse the process for the remaining DCs >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> Sean Durity =E2=80=93 Staff Systems Engineer, Cassandra >>>> >>>> >>>> >>>> *From:* Maxim Parkachov >>>> *Sent:* Thursday, January 30, 2020 10:05 AM >>>> *To:* user@cassandra.apache.org >>>> *Subject:* [EXTERNAL] How to reduce vnodes without downtime >>>> >>>> >>>> >>>> Hi everyone, >>>> >>>> >>>> >>>> with discussion about reducing default vnodes in version 4.0 I would >>>> like to ask, what would be optimal procedure to perform reduction of v= nodes >>>> in existing 3.11.x cluster which was set up with default value 256. Cl= uster >>>> has 2 DC with 5 nodes each and RF=3D3. There is one more restriction, = I could >>>> not add more servers, nor to create additional DC, everything is physi= cal. >>>> This should be done without downtime. >>>> >>>> >>>> >>>> My idea for such procedure would be >>>> >>>> >>>> >>>> for each node: >>>> >>>> - decommission node >>>> >>>> - set auto_bootstrap to true and vnodes to 4 >>>> >>>> - start and wait till node joins cluster >>>> >>>> - run cleanup on rest of nodes in cluster >>>> >>>> - run repair on whole cluster (not sure if needed after cleanup) >>>> >>>> - set auto_bootstrap to false >>>> >>>> repeat for each node >>>> >>>> >>>> >>>> rolling restart of cluster >>>> >>>> cluster repair >>>> >>>> >>>> >>>> Is this sounds right ? My concern is that after decommission, node wil= l >>>> start on the same IP which could create some confusion. >>>> >>>> >>>> >>>> Regards, >>>> >>>> Maxim. >>>> >>>> >>>> ------------------------------ >>>> >>>> >>>> The information in this Internet Email is confidential and may be >>>> legally privileged. It is intended solely for the addressee. Access to= this >>>> Email by anyone else is unauthorized. If you are not the intended >>>> recipient, any disclosure, copying, distribution or any action taken o= r >>>> omitted to be taken in reliance on it, is prohibited and may be unlawf= ul. >>>> When addressed to our clients any opinions or advice contained in this >>>> Email are subject to the terms and conditions expressed in any applica= ble >>>> governing The Home Depot terms of business or client engagement letter= . The >>>> Home Depot disclaims all responsibility and liability for the accuracy= and >>>> content of this attachment and for any damages or losses arising from = any >>>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or ot= her >>>> items of a destructive nature, which may be contained in this attachme= nt >>>> and shall not be liable for direct, indirect, consequential or special >>>> damages in connection with this e-mail message or its attachment. >>>> >>>> >>>> ------------------------------ >>>> >>>> The information in this Internet Email is confidential and may be >>>> legally privileged. It is intended solely for the addressee. Access to= this >>>> Email by anyone else is unauthorized. If you are not the intended >>>> recipient, any disclosure, copying, distribution or any action taken o= r >>>> omitted to be taken in reliance on it, is prohibited and may be unlawf= ul. >>>> When addressed to our clients any opinions or advice contained in this >>>> Email are subject to the terms and conditions expressed in any applica= ble >>>> governing The Home Depot terms of business or client engagement letter= . The >>>> Home Depot disclaims all responsibility and liability for the accuracy= and >>>> content of this attachment and for any damages or losses arising from = any >>>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or ot= her >>>> items of a destructive nature, which may be contained in this attachme= nt >>>> and shall not be liable for direct, indirect, consequential or special >>>> damages in connection with this e-mail message or its attachment. >>>> >>> --0000000000009aac56059da2c99e Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Sergio,

There is a misunderstanding = here. My post makes no recommendation for the value of num_tokens. Rather, = it focuses on how to use the=C2=A0allocate_tokens_for_keyspace setting when= creating a new cluster.

Whilst a value of 4 is us= ed for num_tokens in the post, it was chosen for demonstration purposes. Sp= ecifically it makes:
  • the uneven=C2=A0token distribution= =C2=A0in a small cluster very obvious,
  • identifying the endpoint= s displayed in nodetool ring easy, and
  • the initial_token setup less= verbose and easier to follow.
I will add an editorial note t= o the post with the above information so=C2=A0there is no confusion about w= hy 4 tokens were used.=C2=A0

I would only consider= moving a cluster to 4 tokens if it is larger than 100 nodes. If you read t= hrough the paper=C2=A0that Erick mentioned, written by=C2=A0Joe Lynch &= Josh Snyder, they show that the num_tokens impacts the availability of lar= ge scale clusters.

If you are after more det= ails about the trade-offs between different sized token values, please see = the discussion on the dev mailing list: "[Disc= uss] num_tokens default in Cassandra 4.0".

Regards,
Anthony

On Sat, 1 Feb 2020 at 10:07, Sergio <<= a href=3D"mailto:lapostadisergio@gmail.com">lapostadisergio@gmail.com&g= t; wrote:
https://thelastp= ickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.htm= l=C2=A0This is the article with 4 token recommendations.
@Erick Rami= rez. which is the dev thread for the default 32 tokens recommendation?
<= br>Thanks,
Sergio

Il giorno ven 31 gen 2020 alle ore 14:49 Erick Ramirez = <flightctlr@gm= ail.com> ha scritto:
There's an active discussion going on righ= t now in a separate dev thread. The current "default recommendation&qu= ot; is 32 tokens. But there's a push for 4 in combination with=C2=A0allocate_tokens_for_keyspace=C2=A0from Jon Had= dad & co (based on a paper from Joe Lynch & Josh Snyder).

<= /div>
If you're satisfied with the results from your own testing, g= o with 4 tokens. And that's the key -- you must test, test, TEST! Cheer= s!

On Sat, Feb 1, 2020 at 5:17 AM Arvinder Dhillon <dhillonarvi@gmail.com> w= rote:
What is recommended vnodes now? I read 8 in later cassandra 3= .x
Is the new recommendation 4 now even in version 3.x (askin= g for 3.11)?
Thanks

On Fri, Jan 31, 2020 at 9:49 AM Dur= ity, Sean R <SEAN_R_DURITY@homedepot.com> wrote:

These are good clarifications and expansions.=

=C2=A0

Sean Durity

=C2=A0

From: Anthony Grasso <anthony.grasso@gmail.com> =
Sent: Thursday, January 30, 2020 7:25 PM
To: user <user@cassandra.apache.org>
Subject: Re: [EXTERNAL] How to reduce vnodes without downtime=

=C2=A0

Hi Maxim,

=C2=A0

Basically what Sean suggested is the way to do this = without downtime.

=C2=A0

To clarify the, the three steps following the= "Decommission each node in the DC you are working on" step shoul= d be applied to only the decommissioned=C2=A0nodes. So where it say=C2=A0"al= l nodes" or "every node" it applies to only the d= ecommissioned nodes.

=C2=A0

In addition, the step that says "Wipe data on a= ll the nodes", I would delete all files in the following directories o= n the decommissioned nodes.

  • data (usually located in /var/lib/cassandra/data)
  • commitlogs=C2=A0(usually located in /var/lib/cassandra/commitlogs)
  • hints (usually located in /var/lib/casandra/hints)
  • saved_caches (usually located in /var/lib/cassandra/saved_caches)=

=C2=A0

Cheers,

Anthony

=C2=A0

On Fri, 31 Jan 2020 at 03:05, Durity, Sean R <SEAN_R_DURITY= @homedepot.com> wrote:

Your procedure won=E2=80=99t work very well. On the = first node, if you switched to 4, you would end up with only a tiny fractio= n of the data (because the other nodes would still be at 256). I updated a large cluster (over 150 nodes =E2=80=93 2 DCs) to smaller numb= er of vnodes. The basic outline was this:

=C2=A0

  • Stop all repairs
  • Make sure the app is running against one DC only
  • Change the replication settings on keyspaces to use only 1 DC (basically cu= tting off the other DC)
  • Decommission each node in the DC you are working on. Because the replicatio= n setting are changed, no streaming occurs. But it releases the token assig= nments
  • Wipe data on all the nodes
  • Update configuration on every node to your new settings, including auto_boo= tstrap =3D false
  • Start all nodes. They will choose tokens, but not stream any data=
  • Update replication factor for all keyspaces to include the new DC=
  • I disabled binary on those nodes to prevent app connections
  • Run nodetool reduild with -dc (other DC) on as many nodes as your system ca= n safely handle until they are all rebuilt.
  • Re-enable binary (and app connections to the rebuilt DC)
  • =
  • Turn on repairs
  • Rest for a bit, then reverse the process for the remaining DCs

=C2=A0

=C2=A0

=C2=A0

Sean Durity =E2=80=93 Staff Systems Engineer, Cassan= dra

=C2=A0

From: Maxim Parkachov <lazy.gopher@gmail.com>
Sent: Thursday, January 30, 2020 10:05 AM
To: u= ser@cassandra.apache.org
Subject: [EXTERNAL] How to reduce vnodes without downtime<= /u>

=C2=A0

Hi everyone,

=C2=A0

with discussion about reducing default vnodes in ver= sion 4.0 I would like to ask, what would be optimal procedure to perform re= duction of vnodes in existing 3.11.x cluster which was set up with default value 256. Cluster has 2 DC with 5 nodes each and = RF=3D3. There is one more restriction, I could not add more servers, nor to= create additional DC, everything is physical. This should be done without = downtime.

=C2=A0

My idea for such procedure would be

=C2=A0

for each node:

- decommission=C2=A0node

- set=C2=A0auto_bootstrap to true and vnodes to 4=

- start and wait till node joins cluster

- run cleanup on rest of nodes in cluster<= /u>

- run repair on whole cluster (not sure if needed af= ter cleanup)

- set auto_bootstrap to false

repeat for each node

=C2=A0

rolling restart of cluster

cluster repair

=C2=A0

Is this sounds right ? My concern is that after=C2= =A0decommission,=C2=A0node will start on the same IP which could create som= e confusion.

=C2=A0

Regards,

Maxim.

=C2=A0



The information in this Internet Email is confidential and may be legally p= rivileged. It is intended solely for the addressee. Access to this Email by= anyone else is unauthorized. If you are not the intended recipient, any di= sclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibite= d and may be unlawful. When addressed to our clients any opinions or advice= contained in this Email are subject to the terms and conditions expressed = in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot d= isclaims all responsibility and liability for the accuracy and content of t= his attachment and for any damages or losses arising from any inaccuracies,= errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contain= ed in this attachment and shall not be liable for direct, indirect, consequ= ential or special damages in connection with this e-mail message or its att= achment.




The information in this Internet Email is confidential and may be legally p= rivileged. It is intended solely for the addressee. Access to this Email by= anyone else is unauthorized. If you are not the intended recipient, any di= sclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibite= d and may be unlawful. When addressed to our clients any opinions or advice= contained in this Email are subject to the terms and conditions expressed = in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot d= isclaims all responsibility and liability for the accuracy and content of t= his attachment and for any damages or losses arising from any inaccuracies,= errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contain= ed in this attachment and shall not be liable for direct, indirect, consequ= ential or special damages in connection with this e-mail message or its att= achment.
--0000000000009aac56059da2c99e--