Return-Path: Delivered-To: apmail-incubator-cassandra-user-archive@minotaur.apache.org Received: (qmail 74574 invoked from network); 1 Oct 2009 17:23:00 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 1 Oct 2009 17:23:00 -0000 Received: (qmail 60860 invoked by uid 500); 1 Oct 2009 17:23:00 -0000 Delivered-To: apmail-incubator-cassandra-user-archive@incubator.apache.org Received: (qmail 60837 invoked by uid 500); 1 Oct 2009 17:23:00 -0000 Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cassandra-user@incubator.apache.org Delivered-To: mailing list cassandra-user@incubator.apache.org Received: (qmail 60828 invoked by uid 99); 1 Oct 2009 17:23:00 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Oct 2009 17:23:00 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jbellis@gmail.com designates 209.85.219.205 as permitted sender) Received: from [209.85.219.205] (HELO mail-ew0-f205.google.com) (209.85.219.205) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Oct 2009 17:22:50 +0000 Received: by ewy1 with SMTP id 1so405615ewy.27 for ; Thu, 01 Oct 2009 10:22:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=ZsRhnXoM4+zs95RvJnJ4YappNfSYQGm20VG5jXXbtGo=; b=gVdnnOBK7qKNHf6a+iODp03shXYNzcPKcEgT0hikfywGxYmT5SZ/1aVb9/+qTVKmJI FA5+Khmu2e/AEUMOOw/N6JGsOSVP1E+EYZSiMqwS4FWehIMZd8KIPzR0CViMwx23LeNL kcT/52dvIIMJ2XzXGrkazG9pypmS+wW5gR7pI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=uYEbncuDef2kjwuMG4x/UxDmuGJ7dkh38t43JcpdCvReWiayFjx1LPZtGm1toQZzYz 2W/+pCSBrcRGHo13P9NJ4xJwNVdxC7LdWhOvPIFFrr2Peh8ereo0N1WzFnz5kkD5Cpgk xREbr7BZiFJ2OMR0MM2g//nGX7/XeuwKQCd5U= MIME-Version: 1.0 Received: by 10.216.53.133 with SMTP id g5mr316792wec.37.1254417750373; Thu, 01 Oct 2009 10:22:30 -0700 (PDT) In-Reply-To: <23b1e84e0910011014s3cad1889na7b6b3d37485789f@mail.gmail.com> References: <23b1e84e0910010926w65e08b7dke62d6c615e441645@mail.gmail.com> <23b1e84e0910011014s3cad1889na7b6b3d37485789f@mail.gmail.com> Date: Thu, 1 Oct 2009 12:22:30 -0500 Message-ID: Subject: Re: distributing tokens equally along the key distribution space From: Jonathan Ellis To: cassandra-user@incubator.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org tokenupdater does not move data around; it's just an alternative to setting on each node. so you really want to get your tokens right for your initial set of nodes before adding data. we're finishing up full load balancing for 0.5 but even then it's best to start with a reasonable distribution instead of starting with random and forcing the balancer to move things around a bunch. On Thu, Oct 1, 2009 at 12:14 PM, Igor Katkov wrote: > What is the correct procedure for data re-partitioning? > Suppose I have 3 nodes - "A", "B", "C" > tokens on the ring: > A: 0 > B: 2.8356863910078205288614550619314e+37 > C: 5.6713727820156410577229101238628e+37 > > Then I add node "D", token: 1.4178431955039102644307275309655e+37 (B/2) > Start node "D" with -b > Wait > Run nodeprobe -host hostB ... cleanup on live "B" > Wait > Done > > Now data is not evenly balanced because tokens are not evenly spaced. I see > that there is tokenupdater (org.apache.cassandra.tools.TokenUpdater) > What happens with keys and data if I run it on "A", "B", "C" and "D" with > new, better spaced tokens? Should I? is there a better procedure? > > > > > On Thu, Oct 1, 2009 at 12:48 PM, Jonathan Ellis wrote: >> >> On Thu, Oct 1, 2009 at 11:26 AM, Igor Katkov wrote: >> > Hi, >> > >> > Question#1: >> > How to manually select tokens to force equal spacing of tokens around >> > the >> > hash space? >> >> (Answered by Jun.) >> >> > Question#2: >> > Let's assume that #1 was resolved somehow and key distribution is more >> > or >> > less even. >> > A new node "C" joins the cluster. It's token falls somewhere between two >> > other tokens on the ring (from nodes "A" and "B" clockwise-ordered). >> > From >> > now on "C" is responsible for a portion of data that used to exclusively >> > belong to "B". >> > a. Cassandra v.0.4 will not automatically transfer this data to "C" will >> > it? >> >> It will, if you start C with the -b ("bootstrap") flag. >> >> > b. Do all reads to these keys fail? >> >> No. >> >> > c. What happens with the data reference by these keys on "B"? It will >> > never >> > be accessed there, therefor it becomes garbage. Since there are to GC >> > will >> > it stick forever? >> >> nodeprobe cleanup after the bootstrap completes will instruct B to >> throw out data that has been copied to C. >> >> > d. What happens to replicas of these keys? >> >> These are also handled by -b. >> >> -Jonathan > >