From user-return-59402-archive-asf-public=cust-asf.ponee.io@cassandra.apache.org Mon Jan 15 17:55:59 2018 Return-Path: X-Original-To: archive-asf-public@eu.ponee.io Delivered-To: archive-asf-public@eu.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by mx-eu-01.ponee.io (Postfix) with ESMTP id 3D6AF180657 for ; Mon, 15 Jan 2018 17:55:59 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 2BBB6160C31; Mon, 15 Jan 2018 16:55:59 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id F36B4160C25 for ; Mon, 15 Jan 2018 17:55:57 +0100 (CET) Received: (qmail 52967 invoked by uid 500); 15 Jan 2018 16:55:56 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 52950 invoked by uid 99); 15 Jan 2018 16:55:56 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 15 Jan 2018 16:55:56 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 9FB381A12E2 for ; Mon, 15 Jan 2018 16:55:55 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.632 X-Spam-Level: ** X-Spam-Status: No, score=2.632 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_NEUTRAL=0.652] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=thelastpickle-com.20150623.gappssmtp.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id hgU7sBZGcQzR for ; Mon, 15 Jan 2018 16:55:53 +0000 (UTC) Received: from mail-qt0-f177.google.com (mail-qt0-f177.google.com [209.85.216.177]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id C1A6F5F47E for ; Mon, 15 Jan 2018 16:55:52 +0000 (UTC) Received: by mail-qt0-f177.google.com with SMTP id a16so14918590qtj.3 for ; Mon, 15 Jan 2018 08:55:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=thelastpickle-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=yFA+8e1ewWvgNk22MH0OkWDec+eaQ6nYUx6LweLUB50=; b=MEgSP9zDfXGQR/iiV39ZCfMfvFOQtexA499QIVkN8XQvl8qpc1+EnIU9eN3j9y3iEX F3vj7m3eHnQdBNtG/ajqIF78a0ob2dRME4D0RThQdJpyDB5TYKKEsjUzb92YquLlR6f8 ppwnATkTcthCAX2M8UyWpU4TLlXLq6g0toXMTlpOxprvFLt8ZD/4H4Ov8GWnj8dpm1UR Nd2krvFMszkZ4QTD4svi9GHNOQ8dlkHZp0q09iNZ0zXgZj1jSNT9byJffrBCdYiMOX2V J7ltBaEaOXqDgGsoU3pCWdhPoWZ9v9jB5oi0MN3MtOLXrJsIzKK4qkxt9veJaIcjPrya LPVg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=yFA+8e1ewWvgNk22MH0OkWDec+eaQ6nYUx6LweLUB50=; b=LE5sRTQyy4Z9SrhpILNVjLqZB4lvURVeexNxtcY2eLCgVOVUUrRQgtTsRQAoqRT8Fc qfVS2TU+DHtDIhi21+s6Ellh3ZGxnK7ullUdAmpeN2JEeiEPzpX85To3PF0eUHRFtkwN 5GZA4WihZoLcePw6KC1HNQ5GvVAYNKqT6vjl7gf/BbUJiHP8RghuBvHdm2qmrc8aJEfX /CTxBjlNofu+6+Seei3sJPQmu/pbm59hpl0mqz68ZjrHZv5IxZCfK2XmNz1pK16+cTzi gA3AllY0tUCAI7/HtsF/FaoYN+NXae1igmwobSSAKxI3mU74fG4SK06aclnnzoHq0LNK kM2w== X-Gm-Message-State: AKwxytdgvt6wG+SmQUpRn3unRxxd1CwpNO8VksRn2n92LgdO3Tpx4lNI DffGtUBGZnFDQL5qGLdbFAk71uOAoe4YYTiIni8TJS8F X-Google-Smtp-Source: ACJfBotJT/zS/jdJq2dd1vc+D4lnW8OkiSKxJ+1o43vWgsXiKyBeTI+KDQTj8l+dOmSfEcYajaj8kpjpLbVgphcXcHM= X-Received: by 10.237.60.145 with SMTP id d17mr27033466qtf.319.1516035351314; Mon, 15 Jan 2018 08:55:51 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Alexander Dejanovski Date: Mon, 15 Jan 2018 16:55:40 +0000 Message-ID: Subject: Re: vnodes: high availability To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary="001a11413adc9513750562d37cfc" --001a11413adc9513750562d37cfc Content-Type: text/plain; charset="UTF-8" Hi Kyrylo, the situation is a bit more nuanced than shown by the Datastax diagram, which is fairly theoretical. If you're using SimpleStrategy, there is no rack awareness. Since vnode distribution is purely random, and the replica for a vnode will be placed on the node that owns the next vnode in token order (yeah, that's not easy to formulate), you end up with statistics only. I kinda suck at maths but I'm going to risk making a fool of myself :) The odds for one vnode to be replicated on another node are, in your case, 2/49 (out of 49 remaining nodes, 2 replicas need to be placed). Given you have 256 vnodes, the odds for at least one vnode of a single node to exist on another one is 256*(2/49) = 10.4% Since the relationship is bi-directional (there are the same odds for node B to have a vnode replicated on node A than the opposite), that doubles the odds of 2 nodes being both replica for at least one vnode : 20.8%. Having a smaller number of vnodes will decrease the odds, just as having more nodes in the cluster. (now once again, I hope my maths aren't fully wrong, I'm pretty rusty in that area...) How many queries that will affect is a different question as it depends on which partition currently exist and are queried in the unavailable token ranges. Then you have rack awareness that comes with NetworkTopologyStrategy : If the number of replicas (3 in your case) is proportional to the number of racks, Cassandra will spread replicas in different ones. In that situation, you can theoretically lose as many nodes as you want in a single rack, you will still have two other replicas available to satisfy quorum in the remaining racks. If you start losing nodes in different racks, we're back to doing maths (but the odds will get slightly different). That makes maintenance predictable because you can shut down as many nodes as you want in a single rack without losing QUORUM. Feel free to correct my numbers if I'm wrong. Cheers, On Mon, Jan 15, 2018 at 5:27 PM Kyrylo Lebediev wrote: > Thanks, Rahul. > > But in your example, at the same time loss of Node3 and Node6 leads to > loss of ranges N, C, J at consistency level QUORUM. > > > As far as I understand in case vnodes > N_nodes_in_cluster and > endpoint_snitch=SimpleSnitch, since: > > > 1) "secondary" replicas are placed on two nodes 'next' to the node > responsible for a range (in case of RF=3) > > 2) there are a lot of vnodes on each node > 3) ranges are evenly distribusted between vnodes in case of SimpleSnitch, > > > we get all physical nodes (servers) having mutually adjacent token rages. > Is it correct? > > At least in case of my real-world ~50-nodes cluster with nvodes=256, RF=3 > for this command: > > nodetool ring | grep '^' | awk '{print $1}' | uniq | grep -B2 > -A2 '' | grep -v '' | grep -v '^--' | sort | > uniq | wc -l > > returned number which equals to Nnodes -1, what means that I can't switch > off 2 nodes at the same time w/o losing of some keyrange for CL=QUORUM. > > > Thanks, > > Kyrill > ------------------------------ > *From:* Rahul Neelakantan > *Sent:* Monday, January 15, 2018 5:20:20 PM > *To:* user@cassandra.apache.org > *Subject:* Re: vnodes: high availability > > Not necessarily. It depends on how the token ranges for the vNodes are > assigned to them. For example take a look at this diagram > > http://docs.datastax.com/en/archived/cassandra/2.0/cassandra/architecture/architectureDataDistributeDistribute_c.html > > In the vNode part of the diagram, you will see that Loss of Node 3 and > Node 6, will still not have any effect on Token Range A. But yes if you > lose two nodes that both have Token Range A assigned to them (Say Node 1 > and Node 2), you will have unavailability with your specified configuration. > > You can sort of circumvent this by using the DataStax Java Driver and > having the client recognize a degraded cluster and operate temporarily in > downgraded consistency mode > > > http://docs.datastax.com/en/latest-java-driver-api/com/datastax/driver/core/policies/DowngradingConsistencyRetryPolicy.html > > - Rahul > > On Mon, Jan 15, 2018 at 10:04 AM, Kyrylo Lebediev < > Kyrylo_Lebediev@epam.com> wrote: > > Hi, > > > Let's say we have a C* cluster with following parameters: > > - 50 nodes in the cluster > > - RF=3 > > - vnodes=256 per node > > - CL for some queries = QUORUM > > - endpoint_snitch = SimpleSnitch > > > Is it correct that 2 any nodes down will cause unavailability of a > keyrange at CL=QUORUM? > > > Regards, > > Kyrill > > > -- ----------------- Alexander Dejanovski France @alexanderdeja Consultant Apache Cassandra Consulting http://www.thelastpickle.com --001a11413adc9513750562d37cfc Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Kyrylo,

the situation is a bit more = nuanced than shown by the Datastax diagram, which is fairly theoretical.
If you're using SimpleStrategy, there is no rack awareness. Sin= ce vnode distribution is purely random, and the replica for a vnode will be= placed on the node that owns the next vnode in token order (yeah, that'= ;s not easy to formulate), you end up with statistics only.

<= /div>
I kinda suck at maths but I'm going to risk making a fool of = myself :)

The odds for one vnode to be replicated = on another node are, in your case, 2/49 (out of 49 remaining nodes, 2 repli= cas need to be placed).
Given you have 256 vnodes, the odds for a= t least one vnode of a single node to exist on another one is 256*(2/49) = =3D 10.4%
Since the relationship is bi-directional (there are the= same odds for node B to have a vnode replicated on node A than the opposit= e), that doubles the odds of 2 nodes being both replica for at least one vn= ode : 20.8%.

Having a smaller number of vnodes wil= l decrease the odds, just as having more nodes in the cluster.
(n= ow once again, I hope my maths aren't fully wrong, I'm pretty rusty= in that area...)

How many queries that will affec= t is a different question as it depends on which partition currently exist = and are queried in the unavailable token ranges.

T= hen you have rack awareness that comes with NetworkTopolog= yStrategy :=C2=A0
If the number of replic= as (3 in your case) is proportional to the number of racks, Cassandra will = spread replicas in different ones.
In tha= t situation, you can theoretically lose as many nodes as you want in a sing= le rack, you will still have two other replicas available to satisfy quorum= in the remaining racks.
If you start los= ing nodes in different racks, we're back to doing maths (but the odds w= ill get slightly different).

<= /div>
That makes maintenance predictable because you c= an shut down as many nodes as you want in a single rack without losing QUOR= UM.

Feel free to correct my numbers if I'm wrong.

Cheers,





On Mon, Jan 15, 2= 018 at 5:27 PM Kyrylo Lebediev <Kyrylo_Lebediev@epam.com> wrote:

Thanks, Rahul.

But in your example, at the same = time loss of Node3 and Node6 leads to loss of ranges N, C, J at consistency= level QUORUM.


As far as I understand in case vn= odes > N_nodes_in_cluster and endpoint_snitch=3DSimpleSnitch, since:


1) "secondary" replicas are placed on two nodes 'next' to= the node responsible for a range (in case of RF=3D3)

2) there are a lot of vnodes on e= ach node
3) ranges are evenly distribusted between vnodes in case of SimpleSni= tch,


we get all physical nodes (server= s) having mutually adjacent=C2=A0 token rages.
Is it correct?

At least in case of my real-world ~50-nodes cluster with nvodes=3D256, RF= =3D3 for this command:

nodetool ring | grep '^<ip-prefix>' | awk '{print $= 1}' | uniq | grep -B2 -A2 '<ip_of_a_node>' | grep -v '= ;<ip_of_a_node>' | grep -v '^--' | sort | uniq | wc -l

returned number which equals to Nnodes -1, what means that I can't swit= ch off 2 nodes at the same time w/o losing of some keyrange for CL=3DQUORUM= .


Thanks,

Kyrill


From: = Rahul Neelakantan <r= ahul@rahul.be>
Sent: Monday, January 15, 2018 5:20:20 PM
To: u= ser@cassandra.apache.org
Subject: Re: vnodes: high availability
=C2=A0
Not necessarily. It depends on how the token ranges for th= e vNodes are assigned to them. For example take a look at this diagram=C2= =A0

In the vNode part of the diagram, you will see that Loss of Node 3 and= Node 6, will still not have any effect on Token Range A. But yes if you lo= se two nodes that both have Token Range A assigned to them (Say Node 1 and = Node 2), you will have unavailability with your specified configuration.

You can sort of circumvent this by using the DataStax Java Driver and = having the client recognize a degraded cluster and operate temporarily in d= owngraded consistency mode


- Rahul

On Mon, Jan 15, 2018 at 1= 0:04 AM, Kyrylo Lebediev <Kyrylo_Le= bediev@epam.com> wrote:

Hi,


Let's say we have=C2=A0a C* c= luster with following parameters:

=C2=A0- 50 nodes in the cluster

=C2=A0- RF=3D3

=C2=A0- vnodes=3D256 per node

=C2=A0- CL for some queries =3D Q= UORUM

=C2=A0- endpoint_snitch =3D Simpl= eSnitch


Is it correct that 2 any nodes do= wn=C2=A0will cause unavailability of a keyrange at CL=3DQUORUM?


Regards,

Kyrill




--
<= div style=3D"font-family:"helvetica neue",helvetica,arial,sans-se= rif;line-height:19.5px">-----------------
Ale= xander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Co= nsulting
--001a11413adc9513750562d37cfc--