Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
MIME-Version: 1.0
In-Reply-To: 
 <CACUnPaDenZ4nYqM6wR5Tc3U2-e1k_vGVvSW_iPY8W8iJw_t6fA@mail.gmail.com>
References: 
 <CAJpqPhif13j9fXPT_kmrL57bHyNsGYpDSRTZnBi1u56gNQmjww@mail.gmail.com>
 <CACUnPaArBMYkaeLcfw5+NTR2HDbCTsvBa7J2XW0PWS9qGQ1QrA@mail.gmail.com>
 <CAJpqPhhLYYQQDQbf-dB=9pdfr7cz26OO+Xu2m+2x4Sj=9UYYZw@mail.gmail.com>
 <CACUnPaDenZ4nYqM6wR5Tc3U2-e1k_vGVvSW_iPY8W8iJw_t6fA@mail.gmail.com>
From: sai krishnam raju potturi <pskraju88@gmail.com>
Date: Fri, 9 Oct 2015 10:07:55 -0400
Message-ID: 
 <CAJpqPhjHd-omgeSdPyE3KNXwO--poPZ5-pShdvB7B+EM=htpPw@mail.gmail.com>
Subject: Re: Re : Nodetool Cleanup on multiple nodes in parallel
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=94eb2c036320ebecff0521ac8345

--94eb2c036320ebecff0521ac8345
Content-Type: text/plain; charset=UTF-8

thanks Jonathan. I see a advantage in doing it one AZ or rack at a time.

On Thu, Oct 8, 2015 at 6:41 PM, Jonathan Haddad <jon@jonhaddad.com> wrote:

> My hunch is the bigger your cluster the less impact it will have, as each
> node takes part in smaller and smaller % of total queries.  Considering
> that compaction is always happening, I'd wager if you've got a big cluster
> (as you say you do) you'll probably be ok running several cleanups at a
> time.
>
> I'd say start one, see how your perf is impacted (if at all) and go from
> there.
>
> If you're running a proper snitch you could probably do an entire rack /
> AZ at a time.
>
>
> On Thu, Oct 8, 2015 at 3:08 PM sai krishnam raju potturi <
> pskraju88@gmail.com> wrote:
>
>> We plan to do it during non-peak hours when customer traffic is less.
>> That sums up to 10 nodes a day, which is concerning as we have other data
>> centers to be expanded eventually.
>>
>> Since cleanup is similar to compaction, which is CPU intensive and will
>> effect reads  if this data center were to serve traffic. Is running cleanup
>> in parallel advisable??
>>
>> On Thu, Oct 8, 2015, 17:53 Jonathan Haddad <jon@jonhaddad.com> wrote:
>>
>>> Unless you're close to running out of disk space, what's the harm in it
>>> taking a while?  How big is your DC?  At 45 min per node, you can do 32
>>> nodes a day.  Diverting traffic away from a DC just to run cleanup feels
>>> like overkill to me.
>>>
>>>
>>>
>>> On Thu, Oct 8, 2015 at 2:39 PM sai krishnam raju potturi <
>>> pskraju88@gmail.com> wrote:
>>>
>>>> hi;
>>>>    our cassandra cluster currently uses DSE 4.6. The underlying
>>>> cassandra version is 2.0.14.
>>>>
>>>> We are planning on adding multiple nodes to one of our datacenters.
>>>> This requires "nodetool cleanup". The "nodetool cleanup" operation
>>>> takes around 45 mins for each node.
>>>>
>>>> Datastax documentation recommends running "nodetool cleanup" for one
>>>> node at a time. That would be really long, owing to the size of our
>>>> datacenter.
>>>>
>>>> If we were to divert the read and write traffic away from a particular
>>>> datacenter, could we run "cleanup" on multiple nodes in parallel for
>>>> that datacenter??
>>>>
>>>>
>>>> http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html
>>>>
>>>>
>>>> thanks
>>>> Sai
>>>>
>>>

--94eb2c036320ebecff0521ac8345
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">thanks Jonathan. I see a advantage in doing it one AZ or r=
ack at a time.=C2=A0</div><div class=3D"gmail_extra"><br><div class=3D"gmai=
l_quote">On Thu, Oct 8, 2015 at 6:41 PM, Jonathan Haddad <span dir=3D"ltr">=
&lt;<a href=3D"mailto:jon@jonhaddad.com" target=3D"_blank">jon@jonhaddad.co=
m</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margi=
n:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr">=
My hunch is the bigger your cluster the less impact it will have, as each n=
ode takes part in smaller and smaller % of total queries.=C2=A0 Considering=
 that compaction is always happening, I&#39;d wager if you&#39;ve got a big=
 cluster (as you say you do) you&#39;ll probably be ok running several clea=
nups at a time. =C2=A0<div><br></div><div>I&#39;d say start one, see how yo=
ur perf is impacted (if at all) and go from there. =C2=A0</div><div><br></d=
iv><div>If you&#39;re running a proper snitch you could probably do an enti=
re rack / AZ at a time.</div><div><br></div></div><div class=3D"HOEnZb"><di=
v class=3D"h5"><br><div class=3D"gmail_quote"><div dir=3D"ltr">On Thu, Oct =
8, 2015 at 3:08 PM sai krishnam raju potturi &lt;<a href=3D"mailto:pskraju8=
8@gmail.com" target=3D"_blank">pskraju88@gmail.com</a>&gt; wrote:<br></div>=
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><p dir=3D"ltr">We plan to do it during non-p=
eak hours when customer traffic is less. That sums up to 10 nodes a day, wh=
ich is concerning as we have other data centers to be expanded eventually. =
</p>
<p dir=3D"ltr">Since cleanup is similar to compaction, which is CPU intensi=
ve and will effect reads=C2=A0 if this data center were to serve traffic. I=
s running cleanup in parallel advisable??<br>
</p>
<br><div class=3D"gmail_quote"><div dir=3D"ltr">On Thu, Oct 8, 2015, 17:53=
=C2=A0Jonathan Haddad &lt;<a href=3D"mailto:jon@jonhaddad.com" target=3D"_b=
lank">jon@jonhaddad.com</a>&gt; wrote:<br></div><blockquote class=3D"gmail_=
quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1=
ex"><div dir=3D"ltr">Unless you&#39;re close to running out of disk space, =
what&#39;s the harm in it taking a while?=C2=A0 How big is your DC?=C2=A0 A=
t 45 min per node, you can do 32 nodes a day.=C2=A0 Diverting traffic away =
from a DC just to run cleanup feels like overkill to me.<div><br></div><div=
><br></div></div><br><div class=3D"gmail_quote"><div dir=3D"ltr">On Thu, Oc=
t 8, 2015 at 2:39 PM sai krishnam raju potturi &lt;<a href=3D"mailto:pskraj=
u88@gmail.com" target=3D"_blank">pskraju88@gmail.com</a>&gt; wrote:<br></di=
v><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:=
1px #ccc solid;padding-left:1ex"><div dir=3D"ltr">hi;<div>=C2=A0 =C2=A0our =
cassandra cluster currently uses DSE 4.6. The underlying cassandra version =
is 2.0.14.</div><div><br></div><div>We are planning on adding multiple node=
s to one of our datacenters. This requires &quot;nodetool cleanup&quot;. Th=
e <span style=3D"background-color:rgb(255,255,0)">&quot;nodetool cleanup&qu=
ot; operation takes around 45 mins for each node</span>.</div><div><br></di=
v><div>Datastax documentation recommends running<span style=3D"background-c=
olor:rgb(255,255,0)"> &quot;nodetool cleanup&quot; for one node at a time</=
span>. That would be really long, owing to the size of our datacenter.=C2=
=A0</div><div><br></div><div>If we were to <font color=3D"#000000" style=3D=
"background-color:rgb(255,255,0)">divert the read and write traffic away fr=
om a particular datacenter</font>, could we <span style=3D"background-color=
:rgb(255,255,0)">run &quot;cleanup&quot; on multiple nodes in parallel</spa=
n> for that <span style=3D"background-color:rgb(255,255,0)">datacenter</spa=
n>??</div><div><br></div><div><a href=3D"http://docs.datastax.com/en/cassan=
dra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html" rel=3D"norefer=
rer" style=3D"color:rgb(67,159,224);outline:0px;font-family:Slack-Lato,appl=
eLogo,sans-serif;font-size:15px;line-height:22px" target=3D"_blank">http://=
docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_clu=
ster_t.html</a><br></div><div><br></div><div><br></div><div>thanks</div></d=
iv><div dir=3D"ltr"><div>Sai</div></div></blockquote></div>
</blockquote></div>
</blockquote></div>
</div></div></blockquote></div><br></div>

--94eb2c036320ebecff0521ac8345--