Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of
 rodrigofelixdealmeida@gmail.com designates 209.85.220.44 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAORswtxWYO72m4=J5TXC5GE9j+iFbYB-o7OPy4ZoWuRVcXbq_Q@mail.gmail.com>
References: 
 <CAD4g+KwHT6jOuL4j9v6+4dp2ZKc8dHQB_MmGCLgOkHUALBOAtA@mail.gmail.com>
 <CAEDUwd1XyChxBYTBEDu2yxiBprjq8cR7pQmTU1r5Aom0GJt-UQ@mail.gmail.com>
 <CAD4g+KxHrhoE8hVFkpgZHGR4jZv=q_oP7FZ-qezNWbDmXQJ96w@mail.gmail.com>
 <CAORswtxWYO72m4=J5TXC5GE9j+iFbYB-o7OPy4ZoWuRVcXbq_Q@mail.gmail.com>
From: Rodrigo Felix <rodrigofelixdealmeida@gmail.com>
Date: Wed, 10 Jul 2013 15:23:27 -0300
Message-ID: 
 <CAD4g+KxAm4POR99BHNjUcLaVzVtSECht7KSWht_1J2SH5oQFrg@mail.gmail.com>
Subject: Re: General doubts about bootstrap
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=047d7b163257e3fe9304e12c6017

--047d7b163257e3fe9304e12c6017
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Currently, I'm using cassandra 1.1.5, but I'm considering to update to
1.2.x in order to make use of vnodes.
Doubling the size is not possible to me because I want to measure the
response while adding (or removing) single nodes.
Thank you guys. It help me a lot to understand better how cassandra works.

Att.

*Rodrigo Felix de Almeida*
LSBD - Universidade Federal do Cear=E1
Project Manager
MBA, CSM, CSPO, SCJP


On Wed, Jul 10, 2013 at 11:11 AM, Eric Stevens <mightye@gmail.com> wrote:

> > =3D> Adding a new node between other nodes would avoid running move, bu=
t
> the ring would be unbalanced, right? Would this imply in having a node
> (with bigger range, 1/2 of the range while other 2 nodes have 1/2 each,
> supposing 3 nodes) overloaded? I'm refering
> http://wiki.apache.org/cassandra/Operations#Load_balancing
>>
>>
>>>
>>> Yes, if you're using a single vnode per server, or are running an older
> version of Cassandra.  For lowest impact, doubling the size of your clust=
er
> is recommended so that you can avoid doing moves.  Or if you're on
> Cassandra 1.2+, you can use vnodes, and you should not typically need to
> rebalance after bringing a new server online.
>
>
> On Tue, Jul 9, 2013 at 9:31 PM, Rodrigo Felix <
> rodrigofelixdealmeida@gmail.com> wrote:
>
>> Thank you very much for you response. Follows my comments about your
>> email.
>>
>> Att.
>>
>> *Rodrigo Felix de Almeida*
>> LSBD - Universidade Federal do Cear=E1
>> Project Manager
>> MBA, CSM, CSPO, SCJP
>>
>>
>> On Mon, Jul 8, 2013 at 6:05 PM, Robert Coli <rcoli@eventbrite.com> wrote=
:
>>
>>> On Sat, Jul 6, 2013 at 1:50 PM, Rodrigo Felix <
>>> rodrigofelixdealmeida@gmail.com> wrote:
>>>
>>>>
>>>>    - Is it normal to take about 9 minutes to add a new node? Follows
>>>>    the log generated by a script to add a new node.
>>>>
>>>> Sure.  =3D> OK
>>>
>>>>
>>>>    - Is there a way to reduce the time to start cassandra?
>>>>
>>>> Not usually. =3D> OK
>>>
>>>>
>>>>    - Sometimes cleanup operation takes make minutes (about 10). Is
>>>>    this normal since the amount of data is small (1.7gb at maximum / s=
eed)?
>>>>
>>>> Compaction is throttled, and cleanup is a type of compaction. Bootstra=
p
>>> is also throttled via the streaming throttle. =3D> OK
>>>
>>>>
>>>>    - Considering that I have two seeds in the beginning, their tokens
>>>>    are 0 and 85070591730234615865843651857942052864. When I add a new =
machine,
>>>>    do I need to execute move and cleanup on both seeds? Nowadays, I'm =
running
>>>>    cleanup on seed 0, move + cleanup on the other seed and neither mov=
e nor
>>>>    cleanup on the just added node. Is this OK?
>>>>
>>>> Only nodes which have "lost" ranges need to run cleanup. In general yo=
u
>>> should add new nodes "between" other nodes such that "move" is not requ=
ired
>>> at all.
>>>
>>
>> =3D> Adding a new node between other nodes would avoid running move, but
>> the ring would be unbalanced, right? Would this imply in having a node
>> (with bigger range, 1/2 of the range while other 2 nodes have 1/2 each,
>> supposing 3 nodes) overloaded? I'm refering
>> http://wiki.apache.org/cassandra/Operations#Load_balancing
>>
>>>
>>>>    - What if I do not run cleanup in any existing node when adding or
>>>>    removing a node? Is the data that was not "cleaned up" still availa=
ble if I
>>>>    send a scan, for instance, and the scan range is still in the node =
but it
>>>>    wouldn't be there if I had run cleanup? Data would be gather from o=
ther
>>>>    node, ie. the one that properly has the range specified in the scan=
 query?
>>>>
>>>> If data for range [x] is on node [a] but node [a] is no longer
>>> considered an endpoint for range [x], it will never receive a request t=
o
>>> serve range [x]. =3D> OK
>>>
>>>>
>>>>    - After decommissioning a node, is it advisable to run cleanup in
>>>>    the remaining nodes? The consequences of not to run are the same of=
 not to
>>>>    run when adding a node?
>>>>
>>>> Cleanup is only for the node which lost a range. In decommission case,
>>> no live nodes lost a range, only some nodes gained one. =3D> OK
>>>
>>> =3DRob
>>>
>>
>>
>

--047d7b163257e3fe9304e12c6017
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Currently, I&#39;m using cassandra 1.1.5, but I&#39;m cons=
idering to update to 1.2.x in order to make use of vnodes.<div>Doubling the=
 size is not possible to me because I want to measure the response while ad=
ding (or removing) single nodes.<br>

<div style>Thank you guys. It help me a lot to understand better how cassan=
dra works.</div></div></div><div class=3D"gmail_extra"><br clear=3D"all"><d=
iv>Att.<br><br><b>Rodrigo Felix de Almeida</b><br>LSBD - Universidade Feder=
al do Cear=E1<br>

Project Manager<br>MBA, CSM, CSPO, SCJP<br></div>
<br><br><div class=3D"gmail_quote">On Wed, Jul 10, 2013 at 11:11 AM, Eric S=
tevens <span dir=3D"ltr">&lt;<a href=3D"mailto:mightye@gmail.com" target=3D=
"_blank">mightye@gmail.com</a>&gt;</span> wrote:<br><blockquote class=3D"gm=
ail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-le=
ft:1ex">

<div dir=3D"ltr"><div class=3D"im">&gt;=A0<span style=3D"font-family:arial,=
sans-serif;font-size:13px">=3D&gt; Adding a new node between other nodes wo=
uld avoid running move, but the ring would be unbalanced, right? Would this=
 imply in having a node (with bigger range, 1/2 of the range while other 2 =
nodes have 1/2 each, supposing 3 nodes) overloaded? I&#39;m refering=A0</sp=
an><a href=3D"http://wiki.apache.org/cassandra/Operations#Load_balancing" s=
tyle=3D"font-family:arial,sans-serif;font-size:13px" target=3D"_blank">http=
://wiki.apache.org/cassandra/Operations#Load_balancing</a><blockquote class=
=3D"gmail_quote" style=3D"font-family:arial,sans-serif;font-size:13px;margi=
n:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204=
);border-left-style:solid;padding-left:1ex">


<div dir=3D"ltr"><div class=3D"gmail_extra"><div class=3D"gmail_quote"><div=
><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border=
-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;=
padding-left:1ex">


<div dir=3D"ltr"><ul></ul></div></blockquote></div></div></div></div></bloc=
kquote></div><div>Yes, if you&#39;re using a single vnode per server, or ar=
e running an older version of Cassandra. =A0For lowest impact, doubling the=
 size of your cluster is recommended so that you can avoid doing moves. =A0=
Or if you&#39;re on Cassandra 1.2+, you can use vnodes, and you should not =
typically need to rebalance after bringing a new server online.</div>


</div><div class=3D"HOEnZb"><div class=3D"h5"><div class=3D"gmail_extra"><b=
r><br><div class=3D"gmail_quote">On Tue, Jul 9, 2013 at 9:31 PM, Rodrigo Fe=
lix <span dir=3D"ltr">&lt;<a href=3D"mailto:rodrigofelixdealmeida@gmail.com=
" target=3D"_blank">rodrigofelixdealmeida@gmail.com</a>&gt;</span> wrote:<b=
r>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">Thank you very much for you=
 response. Follows my comments about your email.<br><div class=3D"gmail_ext=
ra">


<div><br clear=3D"all"><div>Att.<br><br><b>Rodrigo Felix de Almeida</b><br>=
LSBD - Universidade Federal do Cear=E1<br>

Project Manager<br>MBA, CSM, CSPO, SCJP<br></div>
<br><br></div><div class=3D"gmail_quote"><div>On Mon, Jul 8, 2013 at 6:05 P=
M, Robert Coli <span dir=3D"ltr">&lt;<a href=3D"mailto:rcoli@eventbrite.com=
" target=3D"_blank">rcoli@eventbrite.com</a>&gt;</span> wrote:<br></div>

<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;p=
adding-left:1ex">

<div dir=3D"ltr"><div><div>On Sat, Jul 6, 2013 at 1:50 PM, Rodrigo Felix <s=
pan dir=3D"ltr">&lt;<a href=3D"mailto:rodrigofelixdealmeida@gmail.com" targ=
et=3D"_blank">rodrigofelixdealmeida@gmail.com</a>&gt;</span> wrote:<br>

</div>

</div><div class=3D"gmail_extra">
<div class=3D"gmail_quote"><div><div><blockquote class=3D"gmail_quote" styl=
e=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(2=
04,204,204);border-left-style:solid;padding-left:1ex"><div dir=3D"ltr">

<div>

<ul><li>Is it normal to take about 9 minutes to add a new node? Follows the=
 log generated by a script to add a new node.</li>
</ul></div></div></blockquote></div></div><div>Sure. =A0=3D&gt; OK</div><di=
v><div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;=
border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:=
solid;padding-left:1ex">


<div dir=3D"ltr"><ul><li>Is there a way to reduce the time to start cassand=
ra?</li>
</ul></div></blockquote></div></div><div>Not usually. =3D&gt; OK</div><div>=
<div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;bo=
rder-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:so=
lid;padding-left:1ex">


<div dir=3D"ltr"><ul>

<li>Sometimes cleanup operation takes make minutes (about 10). Is this norm=
al since the amount of data is small (1.7gb at maximum / seed)?</li></ul></=
div></blockquote></div></div><div>Compaction is throttled, and cleanup is a=
 type of compaction. Bootstrap is also throttled via the streaming throttle=
. =3D&gt; OK<br>


</div><div><div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0=
px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-le=
ft-style:solid;padding-left:1ex"><div dir=3D"ltr"><ul><li>Considering that =
I have two seeds in the beginning, their tokens are 0 and=A0850705917302346=
15865843651857942052864. When I add a new machine, do I need to execute mov=
e and cleanup on both seeds? Nowadays, I&#39;m running cleanup on seed 0, m=
ove + cleanup on the other seed and neither move nor cleanup on the just ad=
ded node. Is this OK?</li>


</ul></div></blockquote></div><div>Only nodes which have &quot;lost&quot; r=
anges need to run cleanup. In general you should add new nodes &quot;betwee=
n&quot; other nodes such that &quot;move&quot; is not required at all.=A0</=
div>


</div></div></div></div></blockquote><div><br></div><div>=3D&gt; Adding a n=
ew node between other nodes would avoid running move, but the ring would be=
 unbalanced, right? Would this imply in having a node (with bigger range, 1=
/2 of the range while other 2 nodes have 1/2 each, supposing 3 nodes) overl=
oaded? I&#39;m refering=A0<a href=3D"http://wiki.apache.org/cassandra/Opera=
tions#Load_balancing" target=3D"_blank">http://wiki.apache.org/cassandra/Op=
erations#Load_balancing</a></div>


<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;p=
adding-left:1ex"><div dir=3D"ltr"><div class=3D"gmail_extra"><div class=3D"=
gmail_quote">


<div>

<div>
<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;p=
adding-left:1ex"><div dir=3D"ltr"><ul>

<li>What if I do not run cleanup in any existing node when adding or removi=
ng a node? Is the data that was not &quot;cleaned up&quot; still available =
if I send a scan, for instance, and the scan range is still in the node but=
 it wouldn&#39;t be there if I had run cleanup? Data would be gather from o=
ther node, ie. the one that properly has the range specified in the scan qu=
ery?</li>


</ul></div></blockquote></div></div><div>If data for range [x] is on node [=
a] but node [a] is no longer considered an endpoint for range [x], it will =
never receive a request to serve range [x]. =3D&gt; OK<br></div><div>

<div>

<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;p=
adding-left:1ex">
<div dir=3D"ltr"><ul>

<li>After decommissioning a node, is it advisable to run cleanup in the rem=
aining nodes? The consequences of not to run are the same of not to run whe=
n adding a node?</li></ul></div></blockquote></div></div><div>Cleanup is on=
ly for the node which lost a range. In decommission case, no live nodes los=
t a range, only some nodes gained one. =3D&gt; OK</div>


<div><br></div><div>=3DRob</div></div></div></div>
</blockquote></div><br></div></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>

--047d7b163257e3fe9304e12c6017--