From user-return-28609-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Thu Sep 6 18:17:43 2012 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 57ED9916B for ; Thu, 6 Sep 2012 18:17:43 +0000 (UTC) Received: (qmail 69244 invoked by uid 500); 6 Sep 2012 18:17:40 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 69209 invoked by uid 500); 6 Sep 2012 18:17:40 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 69201 invoked by uid 99); 6 Sep 2012 18:17:40 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Sep 2012 18:17:40 +0000 X-ASF-Spam-Status: No, hits=2.6 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,TRACKER_ID X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of tyler@datastax.com designates 209.85.214.172 as permitted sender) Received: from [209.85.214.172] (HELO mail-ob0-f172.google.com) (209.85.214.172) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Sep 2012 18:17:33 +0000 Received: by obbwc20 with SMTP id wc20so3353153obb.31 for ; Thu, 06 Sep 2012 11:17:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=ZxDdgUgxQmc4zC7NsntIyxkZGFxn/2BUCXzchP/Vis8=; b=mFnmeB4Omt4jstMeEsINezUoiHCNAVRHitqIlmF2pGgkgArDKXDERlIRhnVkDdFMSL PVNQnrIrovMid03mSO0p/eFo+lI4QDnnemnBYgi5tfzbbRaIU657BI7iyfx0USeuhTh3 UNGTdTGr8v4Fs6jM1OiNMXuW0QQbgOT4u4O062fOL8FX/ceXI1M1vqwyaL/QD+HxFvmM DgJKJ0pOJmabI3b7LB/AuY3qEcYP0zfdtMsdqGy3VKycXwKfeotcZQ1v+H/JNYat/Ld1 /rZLLbQ2dbrh1x3VrWTYQKj+AIc1cdF8V4CH8q2R28hjTh6pCGckJpnQjXTlt/TJ8sL0 5gAQ== MIME-Version: 1.0 Received: by 10.60.24.35 with SMTP id r3mr3384579oef.67.1346955432654; Thu, 06 Sep 2012 11:17:12 -0700 (PDT) Received: by 10.76.132.132 with HTTP; Thu, 6 Sep 2012 11:17:12 -0700 (PDT) In-Reply-To: References: Date: Thu, 6 Sep 2012 13:17:12 -0500 Message-ID: Subject: Re: new "nodetool ring" output and unbalanced ring? From: Tyler Hobbs To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=e89a8fb2067e17778204c90c8011 X-Gm-Message-State: ALoCoQmVb+iqnGDnwr3De5IOi4OyJnylyX1nFrpNs7Ib90v8O3t4eY3RzDa1RYliouNckdAW7GWg X-Virus-Checked: Checked by ClamAV on apache.org --e89a8fb2067e17778204c90c8011 Content-Type: text/plain; charset=ISO-8859-1 To minimize the impact on the cluster, I would bootstrap a new 1d node at (42535295865117307932921825928971026432 - 100), then decommission the 1c node at 42535295865117307932921825928971026432 and run cleanup on your us-east nodes. On Thu, Sep 6, 2012 at 1:11 PM, William Oberman wrote: > Didn't notice the racks! Of course.... > > If I change a 1c to a 1d, what would I have to do to make sure data > shuffles around correctly? Repair everywhere? > > will > > On Thu, Sep 6, 2012 at 2:09 PM, Tyler Hobbs wrote: > >> The main issue is that one of your us-east nodes is in rack 1d, while the >> restart are in rack 1c. With NTS and multiple racks, Cassandra will try >> use one node from each rack as a replica for a range until it either meets >> the RF for the DC, or runs out of racks, in which case it just picks nodes >> sequentially going clockwise around the ring (starting from the range being >> considered, not the last node that was chosen as a replica). >> >> To fix this, you'll either need to make the 1d node a 1c node, or make >> 42535295865117307932921825928971026432 a 1d node so that you're alternating >> racks within that DC. >> >> >> On Thu, Sep 6, 2012 at 12:54 PM, William Oberman < >> oberman@civicscience.com> wrote: >> >>> Hi, >>> >>> I recently upgraded from 0.8.x to 1.1.x (through 1.0 briefly) and >>> nodetool -ring seems to have changed from "owns" to "effectively owns". >>> "Effectively owns" seems to account for replication factor (RF). I'm ok >>> with all of this, yet I still can't figure out what's up with my cluster. >>> I have a NetworkTopologyStrategy with two data centers (DCs) with >>> RF/number nodes in DC combinations of: >>> DC Name, RF, # in DC >>> analytics, 1, 2 >>> us-east, 3, 4 >>> So I'd expect 50% on each analytics node, and 75% for each us-east node. >>> Instead, I have two nodes in us-east with 50/100??? (the other two are >>> 75/75 as expected). >>> >>> Here is the output of nodetool (all nodes report the same thing): >>> Address DC Rack Status State Load >>> Effective-Ownership Token >>> >>> 127605887595351923798765477786913079296 >>> x.x.x.x us-east 1c Up Normal 94.57 GB 75.00% >>> 0 >>> x.x.x.x analytics 1c Up Normal 60.64 GB 50.00% >>> 1 >>> x.x.x.x us-east 1c Up Normal 131.76 GB 75.00% >>> 42535295865117307932921825928971026432 >>> x.x.x.x us-east 1c Up Normal 43.45 GB 50.00% >>> 85070591730234615865843651857942052864 >>> x.x.x.x analytics 1d Up Normal 60.88 GB 50.00% >>> 85070591730234615865843651857942052865 >>> x.x.x.x us-east 1d Up Normal 98.56 GB 100.00% >>> 127605887595351923798765477786913079296 >>> >>> If I use cassandra-cli to do "show keyspaces;" I get (and again, all >>> nodes report the same thing): >>> Keyspace: civicscience: >>> Replication Strategy: >>> org.apache.cassandra.locator.NetworkTopologyStrategy >>> Durable Writes: true >>> Options: [analytics:1, us-east:3] >>> I removed the output about all of my column families (CFs), hopefully >>> that doesn't matter. >>> >>> Did I compute the tokens wrong? Is there a combination of nodetool >>> commands I can run to migrate the data around to rebalance to 75/75/75/75? >>> I routinely run repair already. And as the release notes required, I ran >>> upgradesstables during the upgrade process. >>> >>> Before the upgrade, I was getting analytics = 0%, and us-east = 25% on >>> each node, which I expected for "owns". >>> >>> will >>> >>> >> >> >> -- >> Tyler Hobbs >> DataStax >> >> > > > -- > Will Oberman > Civic Science, Inc. > 3030 Penn Avenue., First Floor > Pittsburgh, PA 15201 > (M) 412-480-7835 > (E) oberman@civicscience.com > -- Tyler Hobbs DataStax --e89a8fb2067e17778204c90c8011 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable To minimize the impact on the cluster, I would bootstrap a new 1d node at (= 42535295865117307932921825928971026432 - 100), then decommission the 1c nod= e at 42535295865117307932921825928971026432 and run cleanup on your us-east= nodes.

On Thu, Sep 6, 2012 at 1:11 PM, William Ober= man <oberman@civicscience.com> wrote:
Didn't notice the racks! =A0Of course....

If I chang= e a 1c to a 1d, what would I have to do to make sure data shuffles around c= orrectly? =A0Repair everywhere?

will

On Thu, Sep 6, 2012 at 2:09 PM, Tyler Hobbs <t= yler@datastax.com> wrote:
The main issue is that one of your us-east nodes is in rack 1d, while the r= estart are in rack 1c.=A0 With NTS and multiple racks, Cassandra will try u= se one node from each rack as a replica for a range until it either meets t= he RF for the DC, or runs out of racks, in which case it just picks nodes s= equentially going clockwise around the ring (starting from the range being = considered, not the last node that was chosen as a replica).

To fix this, you'll either need to make the 1d node a 1c node, or m= ake 42535295865117307932921825928971026432 a 1d node so that you're alt= ernating racks within that DC.


On Thu, Sep 6, 2012 at 12:54 PM, William Obe= rman <oberman@civicscience.com> wrote:
Hi,

I recently upgraded f= rom 0.8.x to 1.1.x (through 1.0 briefly) and nodetool -ring seems to have c= hanged from "owns" to "effectively owns". =A0"Effe= ctively owns"=A0seems to account for replication factor (RF). =A0I'= ;m ok with all of this, yet I still can't figure out what's up with= my cluster. =A0I have a=A0NetworkTopologyStrategy with two data centers (D= Cs) with RF/number nodes in DC combinations of:
DC Name, RF, # in DC
analytics, 1, 2
us-east, 3, 4=
So I'd expect 50% on each analytics node, and 75% for each u= s-east node. =A0Instead, I have two nodes in=A0us-east=A0with 50/100??? (th= e other two are 75/75 as expected).

Here is the output of nodetool (all nodes report the sa= me thing):
Address =A0 =A0 =A0 =A0 DC =A0 =A0 =A0 =A0 =A0Rack =A0 =A0 = =A0 =A0Status State =A0 Load =A0 =A0 =A0 =A0 =A0 =A0Effective-Ownership Tok= en =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0=A0
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A012760588759535192379876= 5477786913079296 =A0 =A0=A0
x.x.x.x =A0 us-east =A0 =A0 1c =A0 =A0 =A0 =A0 =A0Up =A0 =A0 Normal = =A094.57 GB =A0 =A0 =A0 =A075.00% =A0 =A0 =A0 =A0 =A0 =A0 =A00 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=A0<= /div>
x.x.x.x=A0 =A0analytics =A0 1c =A0 =A0 =A0 =A0 =A0Up =A0 =A0 Norm= al =A060.64 GB =A0 =A0 =A0 =A050.00% =A0 =A0 =A0 =A0 =A0 =A0 =A01 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0
x.x.x.x=A0 =A0us-east =A0 =A0 1c =A0 =A0 =A0 =A0 =A0Up =A0 =A0 Normal = =A0131.76 GB =A0 =A0 =A0 75.00% =A0 =A0 =A0 =A0 =A0 =A0 =A04253529586511730= 7932921825928971026432 =A0 =A0 =A0
x.x.x.x=A0 =A0 us-east =A0 =A0= 1c =A0 =A0 =A0 =A0 =A0Up =A0 =A0 Normal =A043.45 GB =A0 =A0 =A0 =A050.00% = =A0 =A0 =A0 =A0 =A0 =A0 =A085070591730234615865843651857942052864 =A0 =A0 = =A0
x.x.x.x=A0 =A0 analytics =A0 1d =A0 =A0 =A0 =A0 =A0Up =A0 =A0 Normal = =A060.88 GB =A0 =A0 =A0 =A050.00% =A0 =A0 =A0 =A0 =A0 =A0 =A085070591730234= 615865843651857942052865 =A0 =A0 =A0
x.x.x.x=A0 =A0us-east =A0 = =A0 1d =A0 =A0 =A0 =A0 =A0Up =A0 =A0 Normal =A098.56 GB =A0 =A0 =A0 =A0100.= 00% =A0 =A0 =A0 =A0 =A0 =A0 127605887595351923798765477786913079296=A0

If I use cassandra-cli to do "show keyspaces= ;" I get (and again, all nodes report the same thing):
= Keyspace: civicscience:
=A0 Replication Strategy: org.apache.cass= andra.locator.NetworkTopologyStrategy
=A0 Durable Writes: true
=A0 =A0 Options: [analytics:1, us-e= ast:3]
I removed the output about all of my column families= (CFs), hopefully that doesn't matter.

Did I c= ompute the tokens wrong? =A0Is there a combination of nodetool commands I c= an run to migrate the data around to rebalance to 75/75/75/75? =A0I routine= ly run repair already. =A0And as the release notes required, I ran upgrades= stables during the upgrade process.

Before the upgrade, I was getting analytics =3D 0%, and= us-east =3D 25% on each node, which I expected for "owns".
=

will




--
Tyler Hobbs
DataStax
<= br>



--
Will ObermanCivic Science, Inc.
3030 Penn Avenue., First Floor
Pittsburgh, PA 1= 5201
(M) 4= 12-480-7835
(E) oberman@civicscience.com



--
Tyler Hobbs
DataStax
<= br> --e89a8fb2067e17778204c90c8011--