cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ran Tavory <ran...@gmail.com>
Subject Re: Bootstrapping taking long
Date Wed, 05 Jan 2011 07:42:44 GMT
I was able to make the node join the ring but I'm confused.
What I did is, first when adding the node, this node was not in the seeds
list of itself. AFAIK this is how it's supposed to be. So it was able to
transfer all data to itself from other nodes but then it stayed in the
bootstrapping state.
So what I did (and I don't know why it works), is add this node to the seeds
list in its own storage-conf.xml file. Then restart the server and then I
finally see it in the ring...
If I had added the node to the seeds list of itself when first joining it,
it would not join the ring but if I do it in two phases it did work.
So it's either my misunderstanding or a bug...

On Wed, Jan 5, 2011 at 7:14 AM, Ran Tavory <rantav@gmail.com> wrote:

> The new node does not see itself as part of the ring, it sees all others
> but itself, so from that perspective the view is consistent.
> The only problem is that the node never finishes to bootstrap. It stays in
> this state for hours (It's been 20 hours now...)
>
>
> $ bin/nodetool -p 9004 -h localhost streams
>> Mode: Bootstrapping
>> Not sending any streams.
>> Not receiving any streams.
>
>
> On Wed, Jan 5, 2011 at 1:20 AM, Nate McCall <nate@riptano.com> wrote:
>
>> Does the new node have itself in the list of seeds per chance? This
>> could cause some issues if so.
>>
>> On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory <rantav@gmail.com> wrote:
>> > I'm still at lost.   I haven't been able to resolve this. I tried
>> > adding another node at a different location on the ring but this node
>> > too remains stuck in the bootstrapping state for many hours without
>> > any of the other nodes being busy with anti compaction or anything
>> > else. I don't know what's keeping it from finishing the bootstrap,no
>> > CPU, no io, files were already streamed so what is it waiting for?
>> > I read the release notes of 0.6.7 and 0.6.8 and there didn't seem to
>> > be anything addressing a similar issue so I figured there was no point
>> > in upgrading. But let me know if you think there is.
>> > Or any other advice...
>> >
>> > On Tuesday, January 4, 2011, Ran Tavory <rantav@gmail.com> wrote:
>> >> Thanks Jake, but unfortunately the streams directory is empty so I
>> don't think that any of the nodes is anti-compacting data right now or had
>> been in the past 5 hours. It seems that all the data was already transferred
>> to the joining host but the joining node, after having received the data
>> would still remain in bootstrapping mode and not join the cluster. I'm not
>> sure that *all* data was transferred (perhaps other nodes need to transfer
>> more data) but nothing is actually happening so I assume all has been moved.
>> >> Perhaps it's a configuration error from my part. Should I use I use
>> AutoBootstrap=true ? Anything else I should look out for in the
>> configuration file or something else?
>> >>
>> >>
>> >> On Tue, Jan 4, 2011 at 4:08 PM, Jake Luciani <jakers@gmail.com> wrote:
>> >>
>> >> In 0.6, locate the node doing anti-compaction and look in the "streams"
>> subdirectory in the keyspace data dir to monitor the anti-compaction
>> progress (it puts new SSTables for bootstrapping node in there)
>> >>
>> >>
>> >> On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory <rantav@gmail.com> wrote:
>> >>
>> >>
>> >> Running nodetool decommission didn't help. Actually the node refused to
>> decommission itself (b/c it wasn't part of the ring). So I simply stopped
>> the process, deleted all the data directories and started it again. It
>> worked in the sense of the node bootstrapped again but as before, after it
>> had finished moving the data nothing happened for a long time (I'm still
>> waiting, but nothing seems to be happening).
>> >>
>> >>
>> >>
>> >>
>> >> Any hints how to analyze a "stuck" bootstrapping node??thanks
>> >> On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory <rantav@gmail.com> wrote:
>> >> Thanks Shimi, so indeed anticompaction was run on one of the other
>> nodes from the same DC but to my understanding it has already ended. A few
>> hour ago...
>> >>
>> >>
>> >>
>> >> I plenty of log messages such as [1] which ended a couple of hours ago,
>> and I've seen the new node streaming and accepting the data from the node
>> which performed the anticompaction and so far it was normal so it seemed
>> that data is at its right place. But now the new node seems sort of stuck.
>> None of the other nodes is anticompacting right now or had been
>> anticompacting since then.
>> >>
>> >>
>> >>
>> >>
>> >> The new node's CPU is close to zero, it's iostats are almost zero so I
>> can't find another bottleneck that would keep it hanging.
>> >> On the IRC someone suggested I'd maybe retry to join this node,
>> e.g. decommission and rejoin it again. I'll try it now...
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> [1] INFO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721
>> CompactionManager.java (line 338) AntiCompacting
>> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]
>> >>
>> >>
>> >>
>> >>
>> >>  INFO [COMPACTION-POOL:1] 2011-01-04 04:34:18,683
>> CompactionManager.java (line 338) AntiCompacting
>> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3874-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3873-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3876-Data.db')]
>> >>
>> >>
>> >>
>> >>
>> >>  INFO [COMPACTION-POOL:1] 2011-01-04 04:34:19,132
>> CompactionManager.java (line 338) AntiCompacting
>> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-951-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-976-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-978-Data.db')]
>> >>
>> >>
>> >>
>> >>
>> >>  INFO [COMPACTION-POOL:1] 2011-01-04 04:34:26,486
>> CompactionManager.java (line 338) AntiCompacting
>> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On Tue, Jan 4, 2011 at 12:45 PM, shimi <shimi.k@gmail.com> wrote:
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> In my experience most of the time it takes for a node to join the
>> cluster is the anticompaction on the other nodes. The streaming part is very
>> fast.
>> >> Check the other nodes logs to see if there is any node doing
>> anticompaction.I don't remember how much data I had in the cluster when I
>> needed to add/remove nodes. I do remember that it took a few hours.
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> The node will join the ring only when it will finish the bootstrap.
>> >> --
>> >> /Ran
>> >>
>> >>
>> >
>> > --
>> > /Ran
>> >
>>
>
>
>
> --
> /Ran
>
>


-- 
/Ran

Mime
View raw message