From user-return-64806-archive-asf-public=cust-asf.ponee.io@cassandra.apache.org Mon Dec 2 11:03:50 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id A21A418064E for ; Mon, 2 Dec 2019 12:03:49 +0100 (CET) Received: (qmail 44539 invoked by uid 500); 2 Dec 2019 11:03:45 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 44529 invoked by uid 99); 2 Dec 2019 11:03:45 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Dec 2019 11:03:45 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 19230C030F for ; Mon, 2 Dec 2019 11:03:44 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0 X-Spam-Level: X-Spam-Status: No, score=0 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-ec2-va.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id K6S_LWP9mpH7 for ; Mon, 2 Dec 2019 11:03:43 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.166.181; helo=mail-il1-f181.google.com; envelope-from=cavallin.enrico@gmail.com; receiver= Received: from mail-il1-f181.google.com (mail-il1-f181.google.com [209.85.166.181]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id 4074CBC536 for ; Mon, 2 Dec 2019 11:03:43 +0000 (UTC) Received: by mail-il1-f181.google.com with SMTP id q15so33015775ils.8 for ; Mon, 02 Dec 2019 03:03:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=itP2fmJd39fKqOXD+54yWoqUU0C7woVXlG0viNiWHqI=; b=sl2ycBIbQjPodpcI4l3YHaZ1c7+m3vI1B6ARC5KicmQfYxXVjWpP5x8fSFToBKtie8 STdoAUn5wzm+J+rDNnvTNUgUIMo4AT09xE2TxV5BENyrgY/2SrzX44hGH2VSxtDzqs66 Sonhh9M+nW++mnmLAe0itNcD7BApWfZr3iMpFZX7xIObT/J3efdS2pR1inEpc4uH5fpM F5+W5IrzYji8JWjlseqsAt1WgdHaRxEwNPxNBA/pmHxTwAc6fLTbu/xj/ppIbwys7Ga+ +m32rD6npgW5Kau4d+yvuUeA42UjCwqKLnA1z4BWQRaY8jLGd2Oz7lUfv2e5TxGsGHaO l4VA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=itP2fmJd39fKqOXD+54yWoqUU0C7woVXlG0viNiWHqI=; b=jmJYw1imoplJzz/rjr3MqhwLl3qFRvcWxSudDlYS4qZcQw5T3AWGXqt1n2vYZFD642 2tvtMvX6hNiQWbI9f7zyRlKdjug1REOLDYhb1CPngG0EQkdyV7jbQjSJ2mZUbQZ2EacC GsIjyh9jUc+oZe5Qky6laj7t3jtB6ouaPcSN7svqsWmvWlME/PjwKGafQADmKR35vAdv aE922SkVYE2GN65/SaG8RHCwh2AnuPeEz9ymnQC4hNFGePNAa5zbowIDgjzIA0nPQDgD zft8OKb39iTagtJeirZMBmepoP+DjL6S2PJDgmqKImpHuuGRZxgZntrMzDPsnE29SiY1 MWMA== X-Gm-Message-State: APjAAAW/BTfhmohW8XxqoVC7KMeOO3E71bKQv+6rxVNUkZxBsAQqc9bF Dk8vP0I/swd4Gq3netqdQ64HUOM794ljv1W9II6jaz/k X-Google-Smtp-Source: APXvYqzB4VBsY8ucr+FWJVtDsRoCRowTtQGVYyG4Sv8QC4OLl3wdrfKGTvdMwa0USvslbmE7znfl2wD9bXX7TdDIXbU= X-Received: by 2002:a92:8c96:: with SMTP id s22mr11195000ill.159.1575284622301; Mon, 02 Dec 2019 03:03:42 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Enrico Cavallin Date: Mon, 2 Dec 2019 12:03:31 +0100 Message-ID: Subject: Re: Uneven token distribution with allocate_tokens_for_keyspace To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary="000000000000553e3b0598b6884e" --000000000000553e3b0598b6884e Content-Type: text/plain; charset="UTF-8" Hi Anthony, thank you for your hints, now the new DC is well balanced within 2%. I did read your article, but I thought it was needed only for new "clusters", not also for new "DCs"; but RF is per DC so it makes sense. You TLP guys are doing a great job for Cassandra community. Thank you, Enrico On Fri, 29 Nov 2019 at 05:09, Anthony Grasso wrote: > Hi Enrico, > > This is a classic chicken and egg problem with the > allocate_tokens_for_keyspace setting. > > The allocate_tokens_for_keyspace setting uses the replication factor of a > DC keyspace to calculate the token allocation when a node is added to the > cluster for the first time. > > Nodes need to be added to the new DC before we can replicate the keyspace > over to it. Herein lies the problem. We are unable to use > allocate_tokens_for_keyspace unless the keyspace is replicated to the new > DC. In addition, as soon as you change the keyspace replication to the new > DC, new data will start to be written to it. To work around this issue you > will need to do the following. > > 1. Decommission all the nodes in the *dcNew*, one at a time. > 2. Once all the *dcNew* nodes are decommissioned, wipe the contents in > the *commitlog*, *data*, *saved_caches*, and *hints* directories of > these nodes. > 3. Make the first node to add into the *dcNew* a seed node. Set the > seed list of the first node with its IP address and the IP addresses of the > other seed nodes in the cluster. > 4. Set the *initial_token* setting for the first node. You can > calculate the values using the algorithm in my blog post: > https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html. > For convenience I have calculated them: > *-9223372036854775808,-4611686018427387904,0,4611686018427387904*. > Note, remove the *allocate_tokens_for_keyspace* setting from the > *cassandra.yaml* file for this (seed) node. > 5. Check to make sure that no other node in the cluster is assigned > any of the four tokens specified above. If there is another node in the > cluster that is assigned one of the above tokens, increment the conflicting > token by values of one until no other node in the cluster is assigned that > token value. The idea is to make sure that these four tokens are unique to > the node. > 6. Add the seed node to cluster. Make sure it is listed in *dcNew *by > checking nodetool status. > 7. Create a dummy keyspace in *dcNew* that has a replication factor of > 2. > 8. Set the *allocate_tokens_for_keyspace* value to be the name of the > dummy keyspace for the other two nodes you want to add to *dcNew*. > Note remove the *initial_token* setting for these other nodes. > 9. Set *auto_bootstrap* to *false* for the other two nodes you want to > add to *dcNew*. > 10. Add the other two nodes to the cluster, one at a time. > 11. If you are happy with the distribution, copy the data to *dcNew* > by running a rebuild. > > > Hope this helps. > > Regards, > Anthony > > On Fri, 29 Nov 2019 at 02:08, Enrico Cavallin > wrote: > >> Hi all, >> I have an old datacenter with 4 nodes and 256 tokens each. >> I am now starting a new datacenter with 3 nodes and num_token=4 >> and allocate_tokens_for_keyspace=myBiggestKeyspace in each node. >> Both DCs run Cassandra 3.11.x. >> >> myBiggestKeyspace has RF=3 in dcOld and RF=2 in dcNew. Now dcNew is very >> unbalanced. >> Also keyspaces with RF=2 in both DCs have the same problem. >> Did I miss something or even with allocate_tokens_for_keyspace I have >> strong limitations with low num_token? >> Any suggestions on how to mitigate it? >> >> # nodetool status myBiggestKeyspace >> Datacenter: dcOld >> ======================= >> Status=Up/Down >> |/ State=Normal/Leaving/Joining/Moving >> -- Address Load Tokens Owns (effective) Host ID >> Rack >> UN x.x.x.x 515.83 GiB 256 76.2% >> fc462eb2-752f-4d26-aae3-84cb9c977b8a rack1 >> UN x.x.x.x 504.09 GiB 256 72.7% >> d7af8685-ba95-4854-a220-bc52dc242e9c rack1 >> UN x.x.x.x 507.50 GiB 256 74.6% >> b3a4d3d1-e87d-468b-a7d9-3c104e219536 rack1 >> UN x.x.x.x 490.81 GiB 256 76.5% >> 41e80c5b-e4e3-46f6-a16f-c784c0132dbc rack1 >> >> Datacenter: dcNew >> ============== >> Status=Up/Down >> |/ State=Normal/Leaving/Joining/Moving >> -- Address Load Tokens Owns (effective) Host ID >> Rack >> UN x.x.x.x 145.47 KiB 4 56.3% >> 7d089351-077f-4c36-a2f5-007682f9c215 rack1 >> UN x.x.x.x 122.51 KiB 4 55.5% >> 625dafcb-0822-4c8b-8551-5350c528907a rack1 >> UN x.x.x.x 127.53 KiB 4 88.2% >> c64c0ce4-2f85-4323-b0ba-71d70b8e6fbf rack1 >> >> Thanks, >> -- ec >> > --000000000000553e3b0598b6884e Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Anthony,
thank you for your hints, now the new DC is well bala= nced within 2%.
I did read your article, but I thought it was needed only fo= r new "clusters", not also for new "DCs"; but RF is per= DC so it makes sense.

You TLP guys are doing a great job for Cassandra = community.

Thank you,
Enrico


On Fri, 29 Nov 2019 at 05:09, Anthony G= rasso <ant= hony.grasso@gmail.com> wrote:
Hi Enrico,

This is= a classic chicken and egg problem with the alloca= te_tokens_for_keyspace setting.

The=C2=A0allocate_tokens_for_keyspace setting uses the= replication factor of a DC keyspace to calculate the token allocation when= a node is added to the cluster for the first time.

Nodes need to be added to the new DC before we can replicate the keyspace= over to it. Herein lies the problem. We are unable to use=C2=A0allocate_tokens_for_keyspace unless the keyspace is r= eplicated=C2=A0to the new DC. In addition, as soon as you change the keyspa= ce replication to the new DC, new data will start to be written to it. To w= ork around this issue you will need to do the following.
  1. = Decommission=C2=A0all the nodes in the dcNew, one at a time.
  2. Once all the dcNew nodes are decommissioned, wipe the contents in t= he commitlog, data, saved_caches, and hints dir= ectories of these nodes.
  3. Make the first node to add into the dcN= ew a seed node. Set the seed list of the first node with its IP address= and the IP addresses of the other seed nodes in the cluster.
  4. Set t= he initial_token setting for the fir= st node. You can calculate the values using the algorithm in my blog post:= =C2=A0https://thelastpickle= .com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html= . For convenience=C2=A0I have calculated them:=C2=A0-9223372036854775808,-4611686018427387904,0,4611686018427387904.
    Note, remove the allocate_tokens_for_k= eyspace setting from the cassandra.yaml file for this (se= ed) node.
  5. Check to make sure that no other node in the cluster is a= ssigned any of the four tokens specified above. If there is another node in= the cluster that is assigned one of the above tokens, increment the confli= cting token by values of one until no other node in the cluster is assigned= that token value. The idea is to make sure that these four tokens are uniq= ue to the node.
  6. Add the seed node to cluster. Make sure it is liste= d in dcNew by checking=C2=A0nodetool status= .
  7. Create a dummy keyspace in=C2=A0dcNew that has a re= plication factor of 2.
  8. Set the=C2=A0all= ocate_tokens_for_keyspace value to be the name of the dummy keys= pace for the other two nodes you want to add to dcNew. Note remove t= he=C2=A0initial_token setting for th= ese other nodes.
  9. Set auto_bootstrap to false for the other two nodes you want to add to dcNew= .
  10. Add the other two nodes to the cluster, one at a time.
  11. If you are happy with the distribution, copy the data to=C2=A0dcNew by running a rebuild.

Hope this helps.=

Regards,
Anthony

On Fri, 29 Nov 20= 19 at 02:08, Enrico Cavallin <cavallin.enrico@gmail.com> wrote:
Hi all,
I have an= old datacenter with 4 nodes and 256 tokens each.
I am now starting a new da= tacenter with 3 nodes and num_token=3D4=C2=A0 and=C2=A0allocate_tokens_for_= keyspace=3DmyBiggestKeyspace in each node.
Both DCs run Cassandra 3.11.x.
myBiggestKeyspace has RF=3D3 in dcOld and RF=3D2 in dcNew. Now dcNew is ve= ry unbalanced.
Also keyspaces with RF=3D2 in both DCs have the same problem.=
= Did I miss something or even with=C2=A0 allocate_tokens_for_keyspace I have strong limitations with low num_token?<= /div>
A= ny suggestions on how to mitigate it?

# nodetool status myBiggestKeyspaceDatacenter: dcOld
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D
Status=3DUp/Down
|/ State=3DNormal/Leaving/Joining= /Moving
-- =C2=A0Address =C2=A0 Load =C2=A0 =C2=A0 =C2=A0 Tokens =C2=A0 = =C2=A0 =C2=A0 Owns (effective) =C2=A0Host ID =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 Rack
UN =C2=A0x.x.x.x =C2=A0515.83 GiB =C2=A0256 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A076.2% =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 fc462eb2-7= 52f-4d26-aae3-84cb9c977b8a =C2=A0rack1
UN =C2=A0x.x.x.x =C2=A0504.09 GiB= =C2=A0256 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A072.7% =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 d7af8685-ba95-4854-a220-bc52dc242e9c =C2=A0rack1
UN = =C2=A0x.x.x.x =C2=A0507.50 GiB =C2=A0256 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= 74.6% =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 b3a4d3d1-e87d-468b-a7d9-3c1= 04e219536 =C2=A0rack1
UN =C2=A0x.x.x.x =C2=A0490.81 GiB =C2=A0256 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A076.5% =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= 41e80c5b-e4e3-46f6-a16f-c784c0132dbc =C2=A0rack1

Datacenter: dcNew<= br>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Status=3DUp/Down
|/ Sta= te=3DNormal/Leaving/Joining/Moving
-- =C2=A0Address =C2=A0 =C2=A0Load = =C2=A0 =C2=A0 =C2=A0 Tokens =C2=A0 =C2=A0 =C2=A0 Owns (effective) =C2=A0Hos= t ID =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Rack
UN =C2=A0x.x.x.x =C2=A0 14= 5.47 KiB =C2=A04 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A056.3% =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 7d089351-077f-4c36-a2f5-007682f9c215 =C2=A0= rack1
UN =C2=A0x.x.x.x =C2=A0 122.51 KiB =C2=A04 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A055.5% =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 625dafc= b-0822-4c8b-8551-5350c528907a =C2=A0rack1
UN =C2=A0x.x.x.x =C2=A0 127.53= KiB =C2=A04 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A088.2% =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 c64c0ce4-2f85-4323-b0ba-71d70b8e6fbf =C2=A0rack= 1


Thanks,
-- ec
--000000000000553e3b0598b6884e--