Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 93854 invoked from network); 5 Jan 2011 15:06:07 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 5 Jan 2011 15:06:07 -0000 Received: (qmail 52105 invoked by uid 500); 5 Jan 2011 15:06:04 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 51955 invoked by uid 500); 5 Jan 2011 15:06:04 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 51946 invoked by uid 99); 5 Jan 2011 15:06:03 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Jan 2011 15:06:03 +0000 X-ASF-Spam-Status: No, hits=4.0 required=10.0 tests=FREEMAIL_FROM,FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of rantav@gmail.com designates 209.85.216.44 as permitted sender) Received: from [209.85.216.44] (HELO mail-qw0-f44.google.com) (209.85.216.44) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Jan 2011 15:05:56 +0000 Received: by qwg5 with SMTP id 5so16526324qwg.31 for ; Wed, 05 Jan 2011 07:05:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:cc:content-type; bh=65QOMiVOO7+s5Ea7OzwpDFZ6X6iLlgBrSRCO6hocyw8=; b=C3keBSNnIjNVeoo8hIcHG8Pd+pgSrpX+DwLFKrDuSQpCrGWofcG8HV7YRAaX5MUqj5 9MrjZlodjHAAi5MYY+03Gt93Y8vZn5nq0bpcd6S8P+4edgxKHG9IamI/5FBDzQjfQ6F3 UZzU1NAh8oTpHsjVsJmbJ16xtGq58wLW744fA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; b=d2LccBADVTcwEkRrR7v+MfjttAixqwkoQpGupye39K7hsRmGUQUIz3/HZZrZfUIe+M MdcbX+xq9q7M+13ExVnzMOiDsV4vxyxDvICcdmUE90ZIR3jEkgMF/GSYjy8a3O3pV5v+ pQoTbPTC8QKXjaj2lYJDic7ycsvJoCI8pOxVE= Received: by 10.229.91.145 with SMTP id n17mr1428827qcm.258.1294239934953; Wed, 05 Jan 2011 07:05:34 -0800 (PST) MIME-Version: 1.0 Received: by 10.229.212.129 with HTTP; Wed, 5 Jan 2011 07:05:04 -0800 (PST) In-Reply-To: References: From: Ran Tavory Date: Wed, 5 Jan 2011 17:05:04 +0200 Message-ID: Subject: Re: Bootstrapping taking long To: user@cassandra.apache.org Cc: Marco Supino Content-Type: multipart/alternative; boundary=00c09fa9c4ff93bafa04991ab7df X-Virus-Checked: Checked by ClamAV on apache.org --00c09fa9c4ff93bafa04991ab7df Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable In storage-conf I see this comment [1] from which I understand that the recommended way to bootstrap a new node is to set AutoBootstrap=3Dtrue and remove itself from the seeds list. Moreover, I did try to set AutoBootstrap=3Dtrue and have the node in its ow= n seeds list, but it would not bootstrap. I don't recall the exact message bu= t it was something like "I found myself in the seeds list therefore I'm not going to bootstrap even though AutoBootstrap is true". [1] false On Wed, Jan 5, 2011 at 4:58 PM, David Boxenhorn wrote: > If "seed list should be the same across the cluster" that means that node= s > *should* have themselves as a seed. If that doesn't work for Ran, then th= at > is the first problem, no? > > > On Wed, Jan 5, 2011 at 3:56 PM, Jake Luciani wrote: > >> Well your ring issues don't make sense to me, seed list should be the sa= me >> across the cluster. >> I'm just thinking of other things to try, non-boostrapped nodes should >> join the ring instantly but reads will fail if you aren't using quorum. >> >> >> On Wed, Jan 5, 2011 at 8:51 AM, Ran Tavory wrote: >> >>> I haven't tried repair. Should I? >>> On Jan 5, 2011 3:48 PM, "Jake Luciani" wrote: >>> > Have you tried not bootstrapping but setting the token and manually >>> calling >>> > repair? >>> > >>> > On Wed, Jan 5, 2011 at 7:07 AM, Ran Tavory wrote: >>> > >>> >> My conclusion is lame: I tried this on several hosts and saw the sam= e >>> >> behavior, the only way I was able to join new nodes was to first sta= rt >>> them >>> >> when they are *not in* their own seeds list and after they >>> >> finish transferring the data, then restart them with themselves *in* >>> their >>> >> own seeds list. After doing that the node would join the ring. >>> >> This is either my misunderstanding or a bug, but the only place I >>> found it >>> >> documented stated that the new node should not be in its own seeds >>> list. >>> >> Version 0.6.6. >>> >> >>> >> On Wed, Jan 5, 2011 at 10:35 AM, David Boxenhorn >> >wrote: >>> >> >>> >>> My nodes all have themselves in their list of seeds - always did - >>> and >>> >>> everything works. (You may ask why I did this. I don't know, I must >>> have >>> >>> copied it from an example somewhere.) >>> >>> >>> >>> On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory wrote= : >>> >>> >>> >>>> I was able to make the node join the ring but I'm confused. >>> >>>> What I did is, first when adding the node, this node was not in th= e >>> seeds >>> >>>> list of itself. AFAIK this is how it's supposed to be. So it was >>> able to >>> >>>> transfer all data to itself from other nodes but then it stayed in >>> the >>> >>>> bootstrapping state. >>> >>>> So what I did (and I don't know why it works), is add this node to >>> the >>> >>>> seeds list in its own storage-conf.xml file. Then restart the serv= er >>> and >>> >>>> then I finally see it in the ring... >>> >>>> If I had added the node to the seeds list of itself when first >>> joining >>> >>>> it, it would not join the ring but if I do it in two phases it did >>> work. >>> >>>> So it's either my misunderstanding or a bug... >>> >>>> >>> >>>> >>> >>>> On Wed, Jan 5, 2011 at 7:14 AM, Ran Tavory >>> wrote: >>> >>>> >>> >>>>> The new node does not see itself as part of the ring, it sees all >>> others >>> >>>>> but itself, so from that perspective the view is consistent. >>> >>>>> The only problem is that the node never finishes to bootstrap. It >>> stays >>> >>>>> in this state for hours (It's been 20 hours now...) >>> >>>>> >>> >>>>> >>> >>>>> $ bin/nodetool -p 9004 -h localhost streams >>> >>>>>> Mode: Bootstrapping >>> >>>>>> Not sending any streams. >>> >>>>>> Not receiving any streams. >>> >>>>> >>> >>>>> >>> >>>>> On Wed, Jan 5, 2011 at 1:20 AM, Nate McCall >>> wrote: >>> >>>>> >>> >>>>>> Does the new node have itself in the list of seeds per chance? >>> This >>> >>>>>> could cause some issues if so. >>> >>>>>> >>> >>>>>> On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory >>> wrote: >>> >>>>>> > I'm still at lost. I haven't been able to resolve this. I trie= d >>> >>>>>> > adding another node at a different location on the ring but th= is >>> node >>> >>>>>> > too remains stuck in the bootstrapping state for many hours >>> without >>> >>>>>> > any of the other nodes being busy with anti compaction or >>> anything >>> >>>>>> > else. I don't know what's keeping it from finishing the >>> bootstrap,no >>> >>>>>> > CPU, no io, files were already streamed so what is it waiting >>> for? >>> >>>>>> > I read the release notes of 0.6.7 and 0.6.8 and there didn't >>> seem to >>> >>>>>> > be anything addressing a similar issue so I figured there was = no >>> >>>>>> point >>> >>>>>> > in upgrading. But let me know if you think there is. >>> >>>>>> > Or any other advice... >>> >>>>>> > >>> >>>>>> > On Tuesday, January 4, 2011, Ran Tavory >>> wrote: >>> >>>>>> >> Thanks Jake, but unfortunately the streams directory is empty >>> so I >>> >>>>>> don't think that any of the nodes is anti-compacting data right >>> now or had >>> >>>>>> been in the past 5 hours. It seems that all the data was already >>> transferred >>> >>>>>> to the joining host but the joining node, after having received >>> the data >>> >>>>>> would still remain in bootstrapping mode and not join the cluste= r. >>> I'm not >>> >>>>>> sure that *all* data was transferred (perhaps other nodes need t= o >>> transfer >>> >>>>>> more data) but nothing is actually happening so I assume all has >>> been moved. >>> >>>>>> >> Perhaps it's a configuration error from my part. Should I use= I >>> use >>> >>>>>> AutoBootstrap=3Dtrue ? Anything else I should look out for in th= e >>> >>>>>> configuration file or something else? >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> On Tue, Jan 4, 2011 at 4:08 PM, Jake Luciani >> > >>> >>>>>> wrote: >>> >>>>>> >> >>> >>>>>> >> In 0.6, locate the node doing anti-compaction and look in the >>> >>>>>> "streams" subdirectory in the keyspace data dir to monitor the >>> >>>>>> anti-compaction progress (it puts new SSTables for bootstrapping >>> node in >>> >>>>>> there) >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory >>> >>>>>> wrote: >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> Running nodetool decommission didn't help. Actually the node >>> refused >>> >>>>>> to decommission itself (b/c it wasn't part of the ring). So I >>> simply stopped >>> >>>>>> the process, deleted all the data directories and started it >>> again. It >>> >>>>>> worked in the sense of the node bootstrapped again but as before= , >>> after it >>> >>>>>> had finished moving the data nothing happened for a long time (I= 'm >>> still >>> >>>>>> waiting, but nothing seems to be happening). >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> Any hints how to analyze a "stuck" bootstrapping node??thanks >>> >>>>>> >> On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory >>> >>>>>> wrote: >>> >>>>>> >> Thanks Shimi, so indeed anticompaction was run on one of the >>> other >>> >>>>>> nodes from the same DC but to my understanding it has already >>> ended. A few >>> >>>>>> hour ago... >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> I plenty of log messages such as [1] which ended a couple of >>> hours >>> >>>>>> ago, and I've seen the new node streaming and accepting the data >>> from the >>> >>>>>> node which performed the anticompaction and so far it was normal >>> so it >>> >>>>>> seemed that data is at its right place. But now the new node see= ms >>> sort of >>> >>>>>> stuck. None of the other nodes is anticompacting right now or ha= d >>> been >>> >>>>>> anticompacting since then. >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> The new node's CPU is close to zero, it's iostats are almost >>> zero so >>> >>>>>> I can't find another bottleneck that would keep it hanging. >>> >>>>>> >> On the IRC someone suggested I'd maybe retry to join this nod= e, >>> >>>>>> e.g. decommission and rejoin it again. I'll try it now... >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> [1] INFO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721 >>> >>>>>> CompactionManager.java (line 338) AntiCompacting >>> >>>>>> >>> [org.apache.cassandra.io.SSTableReader(path=3D'/outbrain/cassandra/data= /outbrain_kvdb/KvAds-6449-Data.db')] >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> INFO [COMPACTION-POOL:1] 2011-01-04 04:34:18,683 >>> >>>>>> CompactionManager.java (line 338) AntiCompacting >>> >>>>>> >>> [org.apache.cassandra.io.SSTableReader(path=3D'/outbrain/cassandra/data= /outbrain_kvdb/KvImpressions-3874-Data.db'),org.apache.cassandra.io.SSTable= Reader(path=3D'/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3873-Da= ta.db'),org.apache.cassandra.io.SSTableReader(path=3D'/outbrain/cassandra/d= ata/outbrain_kvdb/KvImpressions-3876-Data.db')] >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> INFO [COMPACTION-POOL:1] 2011-01-04 04:34:19,132 >>> >>>>>> CompactionManager.java (line 338) AntiCompacting >>> >>>>>> >>> [org.apache.cassandra.io.SSTableReader(path=3D'/outbrain/cassandra/data= /outbrain_kvdb/KvRatings-951-Data.db'),org.apache.cassandra.io.SSTableReade= r(path=3D'/outbrain/cassandra/data/outbrain_kvdb/KvRatings-976-Data.db'),or= g.apache.cassandra.io.SSTableReader(path=3D'/outbrain/cassandra/data/outbra= in_kvdb/KvRatings-978-Data.db')] >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> INFO [COMPACTION-POOL:1] 2011-01-04 04:34:26,486 >>> >>>>>> CompactionManager.java (line 338) AntiCompacting >>> >>>>>> >>> [org.apache.cassandra.io.SSTableReader(path=3D'/outbrain/cassandra/data= /outbrain_kvdb/KvAds-6449-Data.db')] >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> On Tue, Jan 4, 2011 at 12:45 PM, shimi >>> wrote: >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> In my experience most of the time it takes for a node to join >>> the >>> >>>>>> cluster is the anticompaction on the other nodes. The streaming >>> part is very >>> >>>>>> fast. >>> >>>>>> >> Check the other nodes logs to see if there is any node doing >>> >>>>>> anticompaction.I don't remember how much data I had in the clust= er >>> when I >>> >>>>>> needed to add/remove nodes. I do remember that it took a few >>> hours. >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> The node will join the ring only when it will finish the >>> bootstrap. >>> >>>>>> >> -- >>> >>>>>> >> /Ran >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> > >>> >>>>>> > -- >>> >>>>>> > /Ran >>> >>>>>> > >>> >>>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> -- >>> >>>>> /Ran >>> >>>>> >>> >>>>> >>> >>>> >>> >>>> >>> >>>> -- >>> >>>> /Ran >>> >>>> >>> >>>> >>> >>> >>> >> >>> >> >>> >> -- >>> >> /Ran >>> >> >>> >> >>> >> >> > --=20 /Ran --00c09fa9c4ff93bafa04991ab7df Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
In storage-conf I see this comment [1] from which I unders= tand that the recommended way to bootstrap a new node is to set=C2=A0AutoBootstrap=3Dtrue and remove itself from the seeds lis= t.
Moreover, I did try to set AutoBootstrap=3Dtrue and have the node in= its own seeds list, but it would not bootstrap. I don't recall the exa= ct message but it was something like "I found myself in the seeds list= therefore I'm not going to bootstrap even though AutoBootstrap is true= ".=C2=A0

[1]
=C2=A0=C2=A0<!--
=C2=A0= =C2=A0 ~ Turn on to make new [non-seed] nodes automatically migrate the rig= ht data=C2=A0
=C2=A0=C2=A0 ~ to themselves. =C2=A0(If no InitialT= oken is specified, they will pick one=C2=A0
=C2=A0=C2=A0 ~ such that they will get half the range of the most-load= ed node.)
=C2=A0=C2=A0 ~ If a node starts up without bootstrappin= g, it will mark itself bootstrapped
=C2=A0=C2=A0 ~ so that you ca= n't subsequently accidently bootstrap a node with
=C2=A0=C2=A0 ~ data on it. =C2=A0(You can reset this by wiping your da= ta and commitlog
=C2=A0=C2=A0 ~ directories.)
=C2=A0=C2= =A0 ~
=C2=A0=C2=A0 ~ Off by default so that new clusters and upgr= aders from 0.4 don't
=C2=A0=C2=A0 ~ bootstrap immediately. = =C2=A0You should turn this on when you start adding
=C2=A0=C2=A0 ~ new nodes to a cluster that already has data on it. =C2= =A0(If you are upgrading
=C2=A0=C2=A0 ~ from 0.4, start your clus= ter with it off once before changing it to true.
=C2=A0=C2=A0 ~ O= therwise, no data will be lost but you will incur a lot of unnecessary
=C2=A0=C2=A0 ~ I/O before your cluster starts up.)
=C2=A0=C2= =A0-->
=C2=A0=C2=A0<AutoBootstrap>false</AutoBootstra= p>

On Wed, Jan 5, 2011 at 4:58 PM, D= avid Boxenhorn <d= avid@lookin2.com> wrote:
If "seed list should = be the same across the cluster" that means that nodes *should* have th= emselves as a seed. If that doesn't work for Ran, then that is the firs= t problem, no?


On Wed, Jan 5, 2011 at 3:56 PM, Jake Luciani <jakers@gmail.com> wrote:
Well your ring issues don't make sense to me, seed list should be the s= ame across the cluster.
I'm just thinking of other things to try, n= on-boostrapped nodes should join the ring instantly but reads will fail if = you aren't using quorum.


On Wed, Jan 5, 2011 at 8:51 AM, Ran= Tavory <rantav@gmail.com> wrote:

I haven't tried repair.=C2=A0 Should I?

On Jan 5, 2011 3:48 PM, "Jake Luciani"= <jakers@gmail.com= > wrote:
> Have you tried not bootstrappi= ng but setting the token and manually calling
> repair?
>
> On Wed, Jan 5, 2011 at 7:07 AM, Ran Tavory &l= t;rantav@gmail.com> wrote:
>
>> My conclusion is lame: I tried this on se= veral hosts and saw the same
>> behavior, the only way I was able to join new nodes was to first s= tart them
>> when they are *not in* their own seeds list and after= they
>> finish transferring the data, then restart them with them= selves *in* their
>> own seeds list. After doing that the node would join the ring.
= >> This is either my misunderstanding or a bug, but the only place I = found it
>> documented stated that the new node should not be in i= ts own seeds list.
>> Version 0.6.6.
>>
>> On Wed, Jan 5, 2011 at 10:3= 5 AM, David Boxenhorn <
david@lookin2.com>wrote:
>>
>>> My nodes = all have themselves in their list of seeds - always did - and
>>> everything works. (You may ask why I did this. I don't kno= w, I must have
>>> copied it from an example somewhere.)
>= ;>>
>>> On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory <rantav@gmail.com>= ; wrote:
>>>
>>>> I was able to make the node join the ring = but I'm confused.
>>>> What I did is, first when adding = the node, this node was not in the seeds
>>>> list of itself= . AFAIK this is how it's supposed to be. So it was able to
>>>> transfer all data to itself from other nodes but then it s= tayed in the
>>>> bootstrapping state.
>>>> S= o what I did (and I don't know why it works), is add this node to the >>>> seeds list in its own storage-conf.xml file. Then restart = the server and
>>>> then I finally see it in the ring...
= >>>> If I had added the node to the seeds list of itself when f= irst joining
>>>> it, it would not join the ring but if I do it in two phase= s it did work.
>>>> So it's either my misunderstanding o= r a bug...
>>>>
>>>>
>>>> On W= ed, Jan 5, 2011 at 7:14 AM, Ran Tavory <rantav@gmail.com> wrote:
>>>>
>>>>> The new node does not see itself a= s part of the ring, it sees all others
>>>>> but itself, = so from that perspective the view is consistent.
>>>>> Th= e only problem is that the node never finishes to bootstrap. It stays
>>>>> in this state for hours (It's been 20 hours now...= )
>>>>>
>>>>>
>>>>> $= bin/nodetool -p 9004 -h localhost streams
>>>>>> Mode= : Bootstrapping
>>>>>> Not sending any streams.
>>>>>&g= t; Not receiving any streams.
>>>>>
>>>>&g= t;
>>>>> On Wed, Jan 5, 2011 at 1:20 AM, Nate McCall <= nate@riptano.com&= gt; wrote:
>>>>>
>>>>>> Does the new node have its= elf in the list of seeds per chance? This
>>>>>> could= cause some issues if so.
>>>>>>
>>>>&g= t;> On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory <rantav@gmail.com> wrote:
>>>>>> > I'm still at lost. I haven't been a= ble to resolve this. I tried
>>>>>> > adding anothe= r node at a different location on the ring but this node
>>>>= ;>> > too remains stuck in the bootstrapping state for many hours = without
>>>>>> > any of the other nodes being busy with anti c= ompaction or anything
>>>>>> > else. I don't kn= ow what's keeping it from finishing the bootstrap,no
>>>>= ;>> > CPU, no io, files were already streamed so what is it waitin= g for?
>>>>>> > I read the release notes of 0.6.7 and 0.6.8 a= nd there didn't seem to
>>>>>> > be anything ad= dressing a similar issue so I figured there was no
>>>>>&= gt; point
>>>>>> > in upgrading. But let me know if you think th= ere is.
>>>>>> > Or any other advice...
>>= >>>> >
>>>>>> > On Tuesday, January = 4, 2011, Ran Tavory <rantav@gmail.com> wrote:
>>>>>> >> Thanks Jake, but unfortunately the stream= s directory is empty so I
>>>>>> don't think that = any of the nodes is anti-compacting data right now or had
>>>&g= t;>> been in the past 5 hours. It seems that all the data was already= transferred
>>>>>> to the joining host but the joining node, after ha= ving received the data
>>>>>> would still remain in bo= otstrapping mode and not join the cluster. I'm not
>>>>&= gt;> sure that *all* data was transferred (perhaps other nodes need to t= ransfer
>>>>>> more data) but nothing is actually happening so I = assume all has been moved.
>>>>>> >> Perhaps it&= #39;s a configuration error from my part. Should I use I use
>>>= ;>>> AutoBootstrap=3Dtrue ? Anything else I should look out for in= the
>>>>>> configuration file or something else?
>>&= gt;>>> >>
>>>>>> >>
>>&g= t;>>> >> On Tue, Jan 4, 2011 at 4:08 PM, Jake Luciani <jakers@gmail.com>= ;
>>>>>> wrote:
>>>>>> >>
>= ;>>>>> >> In 0.6, locate the node doing anti-compactio= n and look in the
>>>>>> "streams" subdirect= ory in the keyspace data dir to monitor the
>>>>>> anti-compaction progress (it puts new SSTables for= bootstrapping node in
>>>>>> there)
>>>&g= t;>> >>
>>>>>> >>
>>>>= ;>> >> On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory <rantav@gmail.com>
>>>>>> wrote:
>>>>>> >>
>= ;>>>>> >>
>>>>>> >> Running= nodetool decommission didn't help. Actually the node refused
>&g= t;>>>> to decommission itself (b/c it wasn't part of the ri= ng). So I simply stopped
>>>>>> the process, deleted all the data directories and = started it again. It
>>>>>> worked in the sense of the= node bootstrapped again but as before, after it
>>>>>>= ; had finished moving the data nothing happened for a long time (I'm st= ill
>>>>>> waiting, but nothing seems to be happening).
&g= t;>>>>> >>
>>>>>> >>
>= ;>>>>> >>
>>>>>> >>
>>>>>> >> Any hints how to analyze a "stuck&qu= ot; bootstrapping node??thanks
>>>>>> >> On Tue,= Jan 4, 2011 at 1:51 PM, Ran Tavory <rantav@gmail.com>
>>>>>> wrote:
>>>>>> >> Thanks= Shimi, so indeed anticompaction was run on one of the other
>>>= ;>>> nodes from the same DC but to my understanding it has already= ended. A few
>>>>>> hour ago...
>>>>>> >>>>>>>> >>
>>>>>> >>>>>>>> >> I plenty of log messages such as [1] whi= ch ended a couple of hours
>>>>>> ago, and I've seen the new node streaming and = accepting the data from the
>>>>>> node which performe= d the anticompaction and so far it was normal so it
>>>>>= > seemed that data is at its right place. But now the new node seems sor= t of
>>>>>> stuck. None of the other nodes is anticompacting r= ight now or had been
>>>>>> anticompacting since then.=
>>>>>> >>
>>>>>> >><= br> >>>>>> >>
>>>>>> >>
&= gt;>>>>> >> The new node's CPU is close to zero, i= t's iostats are almost zero so
>>>>>> I can't = find another bottleneck that would keep it hanging.
>>>>>> >> On the IRC someone suggested I'd mayb= e retry to join this node,
>>>>>> e.g. decommission an= d rejoin it again. I'll try it now...
>>>>>> >&= gt;
>>>>>> >>
>>>>>> >>
&= gt;>>>>> >>
>>>>>> >>
&g= t;>>>>> >>
>>>>>> >> [1] IN= FO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721
>>>>>> CompactionManager.java (line 338) AntiCompacting>>>>>> [org.apache.cassandra.io.SSTableReader(path=3D&#= 39;/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]
>= >>>>> >>
>>>>>> >>
>>>>>> >>
&= gt;>>>>> >>
>>>>>> >> INFO= [COMPACTION-POOL:1] 2011-01-04 04:34:18,683
>>>>>> Co= mpactionManager.java (line 338) AntiCompacting
>>>>>> [org.apache.cassandra.io.SSTableReader(path=3D'= ;/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3874-Data.db'),or= g.apache.cassandra.io.SSTableReader(path=3D'/outbrain/cassandra/data/ou= tbrain_kvdb/KvImpressions-3873-Data.db'),org.apache.cassandra.io.SSTabl= eReader(path=3D'/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-38= 76-Data.db')]
>>>>>> >>
>>>>>> >>
&= gt;>>>>> >>
>>>>>> >>
&g= t;>>>>> >> INFO [COMPACTION-POOL:1] 2011-01-04 04:34:= 19,132
>>>>>> CompactionManager.java (line 338) AntiCompacting>>>>>> [org.apache.cassandra.io.SSTableReader(path=3D&#= 39;/outbrain/cassandra/data/outbrain_kvdb/KvRatings-951-Data.db'),org.a= pache.cassandra.io.SSTableReader(path=3D'/outbrain/cassandra/data/outbr= ain_kvdb/KvRatings-976-Data.db'),org.apache.cassandra.io.SSTableReader(= path=3D'/outbrain/cassandra/data/outbrain_kvdb/KvRatings-978-Data.db= 9;)]
>>>>>> >>
>>>>>> >>
&= gt;>>>>> >>
>>>>>> >>
&g= t;>>>>> >> INFO [COMPACTION-POOL:1] 2011-01-04 04:34:= 26,486
>>>>>> CompactionManager.java (line 338) AntiCompacting>>>>>> [org.apache.cassandra.io.SSTableReader(path=3D&#= 39;/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]
>= >>>>> >>
>>>>>> >>
>>>>>> >>
&= gt;>>>>> >>
>>>>>> >>
&g= t;>>>>> >> On Tue, Jan 4, 2011 at 12:45 PM, shimi <= shimi.k@gmail.com
> wrote:
>>>>>> >>
>>>>>> >>
&= gt;>>>>> >>
>>>>>> >>
&g= t;>>>>> >>
>>>>>> >> In my = experience most of the time it takes for a node to join the
>>>>>> cluster is the anticompaction on the other nodes. = The streaming part is very
>>>>>> fast.
>>>= ;>>> >> Check the other nodes logs to see if there is any no= de doing
>>>>>> anticompaction.I don't remember how much data = I had in the cluster when I
>>>>>> needed to add/remov= e nodes. I do remember that it took a few hours.
>>>>>>= ; >>
>>>>>> >>
>>>>>> >>
&= gt;>>>>> >>
>>>>>> >>
&g= t;>>>>> >>
>>>>>> >> The no= de will join the ring only when it will finish the bootstrap.
>>>>>> >> --
>>>>>> >> /= Ran
>>>>>> >>
>>>>>> >&g= t;
>>>>>> >
>>>>>> > --
>>>>>> > /Ran
>>>>>> >
>= >>>>>
>>>>>
>>>>>
>= ;>>>>
>>>>> --
>>>>> /Ran >>>>>
>>>>>
>>>>
>>= ;>>
>>>> --
>>>> /Ran
>>>&g= t;
>>>>
>>>
>>
>>
>> = --
>> /Ran
>>
>>





--
/Ran

--00c09fa9c4ff93bafa04991ab7df--