Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3B15C1759D for ; Mon, 30 Mar 2015 06:49:32 +0000 (UTC) Received: (qmail 15635 invoked by uid 500); 30 Mar 2015 06:49:28 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 15593 invoked by uid 500); 30 Mar 2015 06:49:28 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 15583 invoked by uid 99); 30 Mar 2015 06:49:28 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Mar 2015 06:49:28 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of rolo@pythian.com designates 209.85.213.169 as permitted sender) Received: from [209.85.213.169] (HELO mail-ig0-f169.google.com) (209.85.213.169) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Mar 2015 06:49:24 +0000 Received: by igbud6 with SMTP id ud6so65658962igb.1 for ; Sun, 29 Mar 2015 23:47:34 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=HyHw0lNevNQa22DCx774OGPyVuun3IoU9JvLikVcT6k=; b=Am5xnGGHJe+72dhcIacP/wOlWPO97d7+3ZiyQI9uTwJ2XgpV9FWEpNunLlQ3OwkNrR wFr+AqsNiwNWtsOtpW2zb8Ejm+pCq8uPPaQ5M4OcpgX5O0JnnrbecW3TitLFYlRjwm9x 3UteA8nMvhq0P2J6adMIa25m/evfEkZNH7DD2iNr9KaR+KQQ0JsxSMDYC2ZrA2gKhRHp ZdCcQFdyulCnrC67J+rE227R3WkYYtxow5QNAn/E7QY2fyL+I+D7GfNXBFxDrZ1bULUc 2XGoNfodqYSm3bjSze7DffMefrCg3JtCde4Q6Hyc4RhFX5IunsNt44P842I85cT0Y5cK a7Jg== X-Gm-Message-State: ALoCoQmo2awVOoWU756QcM5P2aroodQokgfpHjEDgk5sORRPv1usk9QyNi74LPS2EDyKFVPEE+CmZd2x/Hu4Fj29rrbQTMIiTEJaP87S6/s3nI4Nk6SpHBpGKgGZu/DILb8BfhSuv0+X X-Received: by 10.50.30.202 with SMTP id u10mr7332307igh.28.1427698054225; Sun, 29 Mar 2015 23:47:34 -0700 (PDT) MIME-Version: 1.0 Received: by 10.64.68.114 with HTTP; Sun, 29 Mar 2015 23:47:13 -0700 (PDT) In-Reply-To: References: From: Carlos Rolo Date: Mon, 30 Mar 2015 08:47:13 +0200 Message-ID: Subject: Re: Replication to second data center with different number of nodes To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=047d7b874b2c5ebbf105127bdb2f X-Virus-Checked: Checked by ClamAV on apache.org --047d7b874b2c5ebbf105127bdb2f Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Sharing my experience here. 1) Never had any issues with different size DCs. If the hardware is the same, keep the # to 256. 2) In most of the cases I keep the 256 vnodes and no performance problems (when they are triggered, the cause is not the vnodes #) Regards, Carlos Juzarte Rolo Cassandra Consultant Pythian - Love your data rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzartero= lo * Tel: 1649 www.pythian.com On Mon, Mar 30, 2015 at 6:31 AM, Anishek Agarwal wrote: > Colin, > > When you said larger number of tokens has Query performance hit, is it > read or write performance. Also if you have any links you could share to > shed some light on this it would be great. > > Thanks > Anishek > > On Sun, Mar 29, 2015 at 2:20 AM, Colin Clark wrote: > >> I typically use a # a lot lower than 256, usually less than 20 for >> num_tokens as a larger number has historically had a dramatic impact on >> query performance. >> =E2=80=94 >> Colin Clark >> colin@clark.ws >> +1 612-859-6129 >> skype colin.p.clark >> >> On Mar 28, 2015, at 3:46 PM, Eric Stevens wrote: >> >> If you're curious about how Cassandra knows how to replicate data in the >> remote DC, it's the same as in the local DC, replication is independent = in >> each, and you can even set a different replication strategy per keyspace >> per datacenter. Nodes in each DC take up num_tokens positions on a ring= , >> each partition key is mapped to a position on that ring, and whomever ow= ns >> that part of the ring is the primary for that data. Then (oversimplifie= d) >> r-1 adjacent nodes become replicas for that same data. >> >> On Fri, Mar 27, 2015 at 6:55 AM, Sibbald, Charles < >> Charles.Sibbald@bskyb.com> wrote: >> >>> >>> http://www.datastax.com/documentation/cassandra/2.0/cassandra/configura= tion/configCassandra_yaml_r.html?scroll=3Dreference_ds_qfg_n1r_1k__num_toke= ns >>> >>> So go with a default 256, and leave initial token empty: >>> >>> num_tokens: 256 >>> >>> # initial_token: >>> >>> >>> Cassandra will always give each node the same number of tokens, the >>> only time you might want to distribute this is if your instances are of >>> different sizing/capability which is also a bad scenario. >>> >>> From: Bj=C3=B6rn Hachmann >>> Reply-To: "user@cassandra.apache.org" >>> Date: Friday, 27 March 2015 12:11 >>> To: user >>> Subject: Re: Replication to second data center with different number of >>> nodes >>> >>> >>> 2015-03-27 11:58 GMT+01:00 Sibbald, Charles = : >>> >>>> Cassandra=E2=80=99s Vnodes config >>> >>> >>> =E2=80=8BThank you. Yes, we are using vnodes! The num_token parameter c= ontrols >>> the number of vnodes assigned to a specific node.=E2=80=8B >>> >>> Might be I am seeing problems where are none. >>> >>> Let me rephrase my question: How does Cassandra know it has to >>> replicate 1/3 of all keys to each single node in the second DC? I can s= ee >>> two ways: >>> 1. It has to be configured explicitly. >>> 2. It is derived from the number of nodes available in the data center >>> at the time `nodetool rebuild` is started. >>> >>> Kind regards >>> Bj=C3=B6rn >>> Information in this email including any attachments may be >>> privileged, confidential and is intended exclusively for the addressee.= The >>> views expressed may not be official policy, but the personal views of t= he >>> originator. If you have received it in error, please notify the sender = by >>> return e-mail and delete it from your system. You should not reproduce, >>> distribute, store, retransmit, use or disclose its contents to anyone. >>> Please note we reserve the right to monitor all e-mail communication >>> through our internal and external networks. SKY and the SKY marks are >>> trademarks of Sky plc and Sky International AG and are used under licen= ce. >>> Sky UK Limited (Registration No. 2906991), Sky-In-Home Service Limited >>> (Registration No. 2067075) and Sky Subscribers Services Limited >>> (Registration No. 2340150) are direct or indirect subsidiaries of Sky p= lc >>> (Registration No. 2247735). All of the companies mentioned in this >>> paragraph are incorporated in England and Wales and share the same >>> registered office at Grant Way, Isleworth, Middlesex TW7 5QD. >>> >> >> >> > --=20 -- --047d7b874b2c5ebbf105127bdb2f Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Sharing my experience here.

1) Neve= r had any issues with different size DCs. If the hardware is the same, keep= the # to 256.
2) In most of the cases I keep the 256 vnodes and n= o performance problems (when they are triggered, the cause is not the vnode= s #)

Regards,

Carlos Juzarte Rolo
Cassandra Consultant
=C2=A0
Pythian - Love your data

rolo@pythian | T= witter: cjrolo | Linkedin: linkedin.com/in/carlosjuz= arterolo
Tel:=C2=A01649

On Mon, Mar 30, 2015 at 6:31 AM, Anishek Aga= rwal <anishek@gmail.com> wrote:
Colin,

When you said lar= ger number of tokens has Query performance hit, is it read or write perform= ance. Also if you have any links you could share to shed some light on this= it would be great.

Thanks
Anishek

On Sun, Mar = 29, 2015 at 2:20 AM, Colin Clark <colin@clark.ws> wrote:
I typically= use a # a lot lower than 256, usually less than 20 for num_tokens as a lar= ger number has historically had a dramatic impact on query performance.
=
=E2=80=94
Colin = Clark
skype colin.p.clark

On Mar 28, 2015, at 3:46 PM, Eric S= tevens <mightye@g= mail.com> wrote:

If you're curiou= s about how Cassandra knows how to replicate data in the remote DC, it'= s the same as in the local DC, replication is independent in each, and you = can even set a different replication strategy per keyspace per datacenter.= =C2=A0 Nodes in each DC take up num_tokens positions on a ring, each partit= ion key is mapped to a position on that ring, and whomever owns that part o= f the ring is the primary for that data.=C2=A0 Then (oversimplified) r-1 ad= jacent nodes become replicas for that same data.

On Fri, Mar 27, 2015 at 6:55 AM, Sibba= ld, Charles <Charles.Sibbald@bskyb.com> wrote:

So go with a default 256, and leave initial token empty:

num_tokens: 256
# initial_token:

Cassandra will always give each node the same number of tokens, the on= ly time you might want to distribute this is if your instances are of diffe= rent sizing/capability which is also a bad scenario.

From: Bj=C3=B6rn Hachmann <bjoern.hachmann@= metrigo.de>
Reply-To: "user@cassandra.apache.org&q= uot; <use= r@cassandra.apache.org>
Date: Friday, 27 March 2015 = 12:11
To: user <user@cassandra.apache.org><= br> Subject: Re: Replication to second = data center with different number of nodes


2015-03-27 11:58 GMT+01:00 Sibbald, Charles <Charles.= Sibbald@bskyb.com>:
Cassandra=E2=80=99s Vnodes config

=E2= =80=8BThank you. Yes, we are using vnodes! The num_token parameter controls= the number of vnodes assigned to a specific node.=E2=80=8B

Might= be I am seeing problems where are none.=C2=A0

Let m= e rephrase my question: How does Cassandra know it has to replicate 1/3 of = all keys to each single node in the second DC? I can see two ways:
=C2= =A01. It has to be configured explicitly.
=C2= =A02. It is derived from the number of nodes available in the data center a= t the time `nodetool rebuild` is started.

Kind = regards
Bj=C3= =B6rn
Information in this email including any attachment= s may be privileged, confidential and is intended exclusively for the addre= ssee. The views expressed may not be official policy, but the personal view= s of the originator. If you have received it in error, please notify the sender by return e-mail and delete it from your system. = You should not reproduce, distribute, store, retransmit, use or disclose it= s contents to anyone. Please note we reserve the right to monitor all e-mai= l communication through our internal and external networks. SKY and the SKY marks are trademarks of Sky plc and= Sky International AG and are used under licence. Sky UK Limited (Registrat= ion No. 2906991), Sky-In-Home Service Limited (Registration No. 2067075) an= d Sky Subscribers Services Limited (Registration No. 2340150) are direct or indirect subsidiaries of Sky plc = (Registration No. 2247735). All of the companies mentioned in this paragrap= h are incorporated in England and Wales and share the same registered offic= e at Grant Way, Isleworth, Middlesex TW7 5QD.





--



--047d7b874b2c5ebbf105127bdb2f--