Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of dmcnelis@gmail.com designates
 209.85.217.180 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CADjM4zt18LLB_abaGfgLaqT=SChaeedvOrvvJ1L7=q6bhQ4NtA@mail.gmail.com>
References: 
 <CACy0uxkHGciP9EGqpeEJfr1igBeudSNF7Zv7OROtfrCM2MuOdA@mail.gmail.com>
	<CADjM4zt18LLB_abaGfgLaqT=SChaeedvOrvvJ1L7=q6bhQ4NtA@mail.gmail.com>
Date: Fri, 26 Apr 2013 09:34:05 -0500
Message-ID: 
 <CACy0uxkb26yeq93hen7ACPvobd9W4LX=dvzR97GDw1MGxS5N8g@mail.gmail.com>
Subject: Re: vnodes and load balancing - 1.2.4
From: David McNelis <dmcnelis@gmail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=089e0149423259cd5b04db446d9a

--089e0149423259cd5b04db446d9a
Content-Type: text/plain; charset=ISO-8859-1

Decommissioning those nodes isn't a problem.  When you say remove all the
data, I assume you mean rm -rf my data directory (the default
/var/lib/cassandra/data

I'd done this prior to starting up the nodes, because they were installed
with from the apt-get repo, which automatically starts cassandra (bad form,
on that, as a side not).  But the first time I tried to start the node
after setting my config, I got an error that System.Users didn't exist and
exited out.  Second time I tried to start the nodes, they started.

Outside of clearing the data, having no value for initial token, having
num_tokens set, is there anything else I need to do to bring them in and
bootstrap them?

Note, this isn't the first nodes that I've added to the cluster, but they
are giving me fits.  Additionally, this morning, my seed nodes were all
flipping out with an error like:
https://gist.github.com/dmcnelis/5467636  (AssertionError, when trying to
determine ranges for nodes)

Once I decommissioned the new nodes, I had no more errors in my seed node
logs.


On Fri, Apr 26, 2013 at 5:48 AM, Sam Overton <sam@acunu.com> wrote:

> Some extra information you could provide which will help debug this: the
> logs from those 3 nodes which have no data and the output of "nodetool ring"
>
> Before seeing those I can only guess, but my guess would be that in the
> logs on those 3 nodes you will see this: "Calculating new tokens" and this:
> "Split previous range (blah, blah] into <long list of tokens>"
>
> If that is the case then it means you accidentally started those three
> nodes with the default configuration (single-token) and then subsequently
> changed (num_tokens) and then joined them into the cluster. What happens
> when you do this is that the node thinks it used to be responsible for a
> single range and is being migrated to vnodes, so it splits its single range
> (now a very small part of the keyspace) into 256 smaller ranges, and ends
> up with just a tiny portion of the ring assigned to it.
>
> To fix this you'll need to decommission those 3 nodes, remove all data
> from them, then bootstrap them in again with the correct configuration from
> the start.
>
>  Sam
>
>
>
> On 26 April 2013 06:07, David McNelis <dmcnelis@gmail.com> wrote:
>
>> So, I had 7 nodes that I set up using vnodes, 256 tokens each, no problem.
>>
>> I added two 512 token nodes, no problem, things seemed to balance.
>>
>> The next 3 nodes I added, all at 256 tokens, and they have a cumulative
>> load of 116mb (where as the other nodes are at ~100GB and ~200GB (256 and
>> 512 respectively).
>>
>> Anyone else seen this is 1.2.4?
>>
>> The nodes seem to join the cluster ok, and I have num_tokens set and have
>> tried both an empty initial_token and a commented out initial token, with
>> no change.
>>
>> I see nothing streaming with netstats either, though these nodes were
>> added days apart.  At first I thought I must have a hot key or something,
>> but that doesn't seem to be the case, since the node I thought that one was
>> on has evened out over the past couple of days with no new nodes added.
>>
>> I really *DON'T* want to deal with another shuffle....but what options do
>> I have, since vnodes "make it unneeded to balance the cluster"?  (which, at
>> the moment, seems like a load of bullshit).
>>
>
>
>
> --
> Sam Overton
> Acunu | http://www.acunu.com | @acunu
>

--089e0149423259cd5b04db446d9a
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Decommissioning those nodes isn&#39;t a problem. =A0When y=
ou say remove all the data, I assume you mean rm -rf my data directory (the=
 default /var/lib/cassandra/data<div><br></div><div style>I&#39;d done this=
 prior to starting up the nodes, because they were installed with from the =
apt-get repo, which automatically starts cassandra (bad form, on that, as a=
 side not). =A0But the first time I tried to start the node after setting m=
y config, I got an error that System.Users didn&#39;t exist and exited out.=
 =A0Second time I tried to start the nodes, they started.</div>
<div style><br></div><div style>Outside of clearing the data, having no val=
ue for initial token, having num_tokens set, is there anything else I need =
to do to bring them in and bootstrap them? =A0</div><div style><br></div>
<div style>Note, this isn&#39;t the first nodes that I&#39;ve added to the =
cluster, but they are giving me fits. =A0Additionally, this morning, my see=
d nodes were all flipping out with an error like:=A0</div><div style><a hre=
f=3D"https://gist.github.com/dmcnelis/5467636">https://gist.github.com/dmcn=
elis/5467636</a> =A0(AssertionError, when trying to determine ranges for no=
des)<br>
</div><div style><br></div><div style>Once I decommissioned the new nodes, =
I had no more errors in my seed node logs.</div></div><div class=3D"gmail_e=
xtra"><br><br><div class=3D"gmail_quote">On Fri, Apr 26, 2013 at 5:48 AM, S=
am Overton <span dir=3D"ltr">&lt;<a href=3D"mailto:sam@acunu.com" target=3D=
"_blank">sam@acunu.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">Some extra information you =
could provide which will help debug this: the logs from those 3 nodes which=
 have no data and the output of &quot;nodetool ring&quot;<div>
<br></div><div>Before seeing those I can only guess, but my guess would be =
that in the logs on those 3 nodes you will see this: &quot;Calculating new =
tokens&quot; and this: &quot;Split previous range (blah, blah] into &lt;lon=
g list of tokens&gt;&quot;</div>


<div><br></div><div>If that is the case then it means you accidentally star=
ted those three nodes with the default configuration (single-token) and the=
n subsequently changed (num_tokens) and then joined them into the cluster. =
What happens when you do this is that the node thinks it used to be respons=
ible for a single range and is being migrated to vnodes, so it splits its s=
ingle range (now a very small part of the keyspace) into 256 smaller ranges=
, and ends up with just a tiny portion of the ring assigned to it.</div>


<div><br></div><div>To fix this you&#39;ll need to decommission those 3 nod=
es, remove all data from them, then bootstrap them in again with the correc=
t configuration from the start.</div><div><br></div>
<div>
Sam</div><div><br></div><div class=3D"gmail_extra"><div><div class=3D"h5"><=
br><br><div class=3D"gmail_quote">On 26 April 2013 06:07, David McNelis <sp=
an dir=3D"ltr">&lt;<a href=3D"mailto:dmcnelis@gmail.com" target=3D"_blank">=
dmcnelis@gmail.com</a>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">So, I had 7 nodes that I se=
t up using vnodes, 256 tokens each, no problem.<div><br></div><div>I added =
two 512 token nodes, no problem, things seemed to balance.</div>


<div><br></div><div>The next 3 nodes I added, all at 256 tokens, and they h=
ave a cumulative load of 116mb (where as the other nodes are at ~100GB and =
~200GB (256 and 512 respectively). =A0</div>
<div><br></div><div>Anyone else seen this is 1.2.4?</div><div><br></div><di=
v>The nodes seem to join the cluster ok, and I have num_tokens set and have=
 tried both an empty initial_token and a commented out initial token, with =
no change.</div>


<div><br></div><div>I see nothing streaming with netstats either, though th=
ese nodes were added days apart. =A0At first I thought I must have a hot ke=
y or something, but that doesn&#39;t seem to be the case, since the node I =
thought that one was on has evened out over the past couple of days with no=
 new nodes added.</div>


<div><br></div><div>I really *DON&#39;T* want to deal with another shuffle.=
...but what options do I have, since vnodes &quot;make it unneeded to balan=
ce the cluster&quot;? =A0(which, at the moment, seems like a load of bullsh=
it).</div>


</div>
</blockquote></div><br><br clear=3D"all"><div><br></div></div></div><span c=
lass=3D"HOEnZb"><font color=3D"#888888">-- <br><span style=3D"color:rgb(136=
,136,136);font-family:arial,sans-serif;font-size:13px">Sam Overton<br>Acunu=
 |=A0<a href=3D"http://www.acunu.com/" style=3D"color:rgb(0,0,204)" target=
=3D"_blank">http://www.acunu.com</a>=A0| @acunu</span>
</font></span></div></div>
</blockquote></div><br></div>

--089e0149423259cd5b04db446d9a--