Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 881CBD194 for ; Wed, 31 Oct 2012 18:17:51 +0000 (UTC) Received: (qmail 97994 invoked by uid 500); 31 Oct 2012 18:17:48 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 97967 invoked by uid 500); 31 Oct 2012 18:17:48 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 97957 invoked by uid 99); 31 Oct 2012 18:17:48 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 31 Oct 2012 18:17:48 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of eevans@acunu.com designates 209.85.215.44 as permitted sender) Received: from [209.85.215.44] (HELO mail-la0-f44.google.com) (209.85.215.44) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 31 Oct 2012 18:17:41 +0000 Received: by mail-la0-f44.google.com with SMTP id b11so1388547lam.31 for ; Wed, 31 Oct 2012 11:17:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=acunu.com; s=google; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=vFoRVGL48hNFcsX/ecEB06cjEWhQIAaXBjdcX0mLOmg=; b=IVt1Pqunzb4cPk/2qtcHK/PmJSAM5gI08zckhjMFIw7jZ91LUUvs7LgRaodkg5ZI9U UDbBTMJY5CfTVZd1UidYZfjV8PuLXKrocpn4+6a0giwlvW9rrBZOJeC44/s5fd0s4Yuk ouTpY3+GAAihVzm2ZmDE4FiMXnQej3NnspWZU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=vFoRVGL48hNFcsX/ecEB06cjEWhQIAaXBjdcX0mLOmg=; b=ToCDeOBwTylVUZz6KCTkdWDTjeIlmaYEYG+4Tm6eJwX/zaKOUByUzn1PKpM3ZqQCPs 4JPAQQnnAXAchsLxvuxqyw2abKyJ0DKaeSG7mR3Z3GfUREtWSuSElHV5mKJvzQg99Y/A YNkyf1M6aiUU314n4aNwP3X1rq/oJ2x+g1J7XQtDZaUdhL3NwImRbHInwtxp6wDYmeNb MQoh78BumINII+A7l/SWhJvg3befKlzU080U+CAxBH8kj96JrT+uxcNcUySAN8O60pkK IlbPUnQh3pvtOFNa/Kc3r/jRHOPOjq5oW5oX/31S6QYHkFnAVricXbj5K/VkyaE15kWq u4TA== MIME-Version: 1.0 Received: by 10.112.51.206 with SMTP id m14mr14439860lbo.45.1351707440757; Wed, 31 Oct 2012 11:17:20 -0700 (PDT) Received: by 10.112.135.166 with HTTP; Wed, 31 Oct 2012 11:17:20 -0700 (PDT) In-Reply-To: References: Date: Wed, 31 Oct 2012 13:17:20 -0500 Message-ID: Subject: Re: distribution of token ranges with virtual nodes From: Eric Evans To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Gm-Message-State: ALoCoQnShXjB0UzzctNpR3qgjEDMBftgpsiJMOjImlI3FI2U3wsudnGV5FMFOiMPlB9GSAZU5cd7 X-Virus-Checked: Checked by ClamAV on apache.org On Wed, Oct 31, 2012 at 11:38 AM, John Sanda wrote: > Can/should i assume that i will get even range distribution or close to it with random > token selection? The short answer is: If you're using virtual nodes, random token selection will give you even range distribution. The somewhat longer answer is that this is really a function of the total number of tokens. The more randomly generated tokens a cluster has, the more distribution will even out. The reason this can work for virtual nodes where it has not for the older 1-token-per-node model is because (assuming a reasonable num_tokens value), virtual nodes gives you a much higher token count for a given number of nodes. That wiki page you cite wasn't really intended to be documentation (expect some of that soon though), but what that section was trying to convey was that while random distribution is quite good, it may not be 100% perfect, especially when the number of nodes is low (remember, the number of tokens scales with the number of nodes). I think this is (or may be) a problem for some. If you're forced to manually calculate tokens then you are quite naturally going to calculate a perfect distribution, and if you've grown accustomed to this, seeing the ownership values off by a few percent could really bring out your inner OCD. :) > For the sake of discussion, what is a reasonable default to start > with for num_tokens assuming nodes are homogenous? That wiki page mentions a > default of 256 which I see commented out in cassandra.yaml; however, > Config.num_tokens is set to 1. The (unconfigured )default is 1. That is to say that virtual nodes is not enabled. The current recommendation when setting this, (documented in the config) is 256. > Maybe I missed where the default of 256 is > used. From some initial testing though, it looks like 1 token per node is > being used. Using defaults in cassandra.yaml, I see this in my logs, Right. And it's worth noting that if you uncomment num_tokens *after* starting a node with it commented (i.e. num_tokens: 1), then it will migrate you to virtual nodes by splitting the existing partition 256 ways. This is *not* the equivalent of starting a node with num_tokens = 256 for the first time. The latter would leave you with randomized placement, the former would require you to perform a shuffle to achieve that. -- Eric Evans Acunu | http://www.acunu.com | @acunu