Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of ruchir.jha@gmail.com
 designates 209.85.217.178 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAFz1oU2jrN+ctjdohqt0cMCKr27z6DXBu-VZiDkoaEROgcvtBw@mail.gmail.com>
References: 
 <CALyJi3G6w-d0maf3GxnysnvEa0UZKFg3PEbeytow+fkKscWbkA@mail.gmail.com>
	<CAFz1oU2jrN+ctjdohqt0cMCKr27z6DXBu-VZiDkoaEROgcvtBw@mail.gmail.com>
Date: Tue, 5 Aug 2014 09:57:31 -0400
Message-ID: 
 <CALyJi3EV_17dmks9=zbEbLnNgocwobH7yTTiaWXo75F84FdRdg@mail.gmail.com>
Subject: Re: Node bootstrap
From: Ruchir Jha <ruchir.jha@gmail.com>
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Content-Type: multipart/alternative; boundary=001a11c3bfaaa59ccc04ffe23c0c

--001a11c3bfaaa59ccc04ffe23c0c
Content-Type: text/plain; charset=UTF-8

Thanks Patricia for your response!

On the new node, I just see a lot of the following:

INFO [FlushWriter:75] 2014-08-05 09:53:04,394 Memtable.java (line 400)
Writing Memtable
INFO [CompactionExecutor:3] 2014-08-05 09:53:11,132 CompactionTask.java
(line 262) Compacted 12 sstables to

so basically it is just busy flushing, and compacting. Would you have any
ideas on why the 2x disk space blow up. My understanding was that if
initial_token is left empty on the new node, it just contacts the heaviest
node and bisects its token range. And the heaviest node is around 2.1 TB,
and the new node is already at 4 TB. Could this be because compaction is
falling behind?

Ruchir


On Mon, Aug 4, 2014 at 7:23 PM, Patricia Gorla <patricia@thelastpickle.com>
wrote:

> Ruchir,
>
> What exactly are you seeing in the logs? Are you running major compactions
> on the new bootstrapping node?
>
> With respect to the seed list, it is generally advisable to use 3 seed
> nodes per AZ / DC.
>
> Cheers,
>
>
> On Mon, Aug 4, 2014 at 11:41 AM, Ruchir Jha <ruchir.jha@gmail.com> wrote:
>
>> I am trying to bootstrap the thirteenth node in a 12 node cluster where
>> the average data size per node is about 2.1 TB. The bootstrap streaming has
>> been going on for 2 days now, and the disk size on the new node is already
>> above 4 TB and still going. Is this because the new node is running major
>> compactions while the streaming is going on?
>>
>> One thing that I noticed that seemed off was the seeds property in the
>> yaml of the 13th node comprises of 1..12. Where as the seeds property on
>> the existing 12 nodes consists of all the other nodes except the thirteenth
>> node. Is this an issue?
>>
>> Any other insight is appreciated?
>>
>> Ruchir.
>>
>>
>>
>
>
> --
> Patricia Gorla
> @patriciagorla
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com <http://thelastpickle.com>
>

--001a11c3bfaaa59ccc04ffe23c0c
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>Thanks Patricia for your response!</div><div><br></di=
v><div>On the new node, I just see a lot of the following:</div><div><br></=
div><div>INFO [FlushWriter:75] 2014-08-05 09:53:04,394 Memtable.java (line =
400) Writing Memtable<br>
</div>INFO [CompactionExecutor:3] 2014-08-05 09:53:11,132 CompactionTask.ja=
va (line 262) Compacted 12 sstables to<br><div><br></div><div>so basically =
it is just busy flushing, and compacting. Would you have any ideas on why t=
he 2x disk space blow up. My understanding was that if initial_token is lef=
t empty on the new node, it just contacts the heaviest node and bisects its=
 token range. And the heaviest node is around 2.1 TB, and the new node is a=
lready at 4 TB. Could this be because compaction is falling behind?</div>
<div><br></div><div>Ruchir</div></div><div class=3D"gmail_extra"><br><br><d=
iv class=3D"gmail_quote">On Mon, Aug 4, 2014 at 7:23 PM, Patricia Gorla <sp=
an dir=3D"ltr">&lt;<a href=3D"mailto:patricia@thelastpickle.com" target=3D"=
_blank">patricia@thelastpickle.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">Ruchir,<div><br></div><div>=
What exactly are you seeing in the logs? Are you running major compactions =
on the new bootstrapping node?</div>
<div><br></div><div>With respect to the seed list, it is generally advisabl=
e to use 3 seed nodes per AZ / DC.</div>

<div><br></div><div>Cheers,</div></div><div class=3D"gmail_extra"><div><div=
 class=3D"h5"><br><br><div class=3D"gmail_quote">On Mon, Aug 4, 2014 at 11:=
41 AM, Ruchir Jha <span dir=3D"ltr">&lt;<a href=3D"mailto:ruchir.jha@gmail.=
com" target=3D"_blank">ruchir.jha@gmail.com</a>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">I am trying to bootstrap th=
e thirteenth node in a 12 node cluster where the average data size per node=
 is about 2.1 TB. The bootstrap streaming has been going on for 2 days now,=
 and the disk size on the new node is already above 4 TB and still going. I=
s this because the new node is running major compactions while the streamin=
g is going on? =C2=A0<div>


<br></div><div>One thing that I noticed that seemed off was the seeds prope=
rty in the yaml of the 13th node comprises of 1..12. Where as the seeds pro=
perty on the existing 12 nodes consists of all the other nodes except the t=
hirteenth node. Is this an issue?</div>


<div><br></div><div>Any other insight is appreciated?</div><span><font colo=
r=3D"#888888"><div><br></div><div>Ruchir.</div><div><div><br></div><div><br=
></div></div></font></span></div>
</blockquote></div><br><br clear=3D"all"><div><br></div></div></div><span c=
lass=3D"HOEnZb"><font color=3D"#888888">-- <br><div dir=3D"ltr">Patricia Go=
rla<div>@patriciagorla</div><div><br></div><div>Consultant</div><div>Apache=
 Cassandra Consulting</div>
<div><a href=3D"http://thelastpickle.com" target=3D"_blank">http://www.thel=
astpickle.com</a></div>

</div>
</font></span></div>
</blockquote></div><br></div>

--001a11c3bfaaa59ccc04ffe23c0c--