cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DuyHai Doan <doanduy...@gmail.com>
Subject Re: Questions about bootrapping and compactions during bootstrapping
Date Sun, 21 Dec 2014 12:05:05 GMT
"*Is it reasonable to do “nodetool disableautocompaction” on the
bootstrapping node?*" --> It's a tricky question

By default compaction is here to guarantee that you don't have too many
small SSTables hurting the read path. Now in production I've seen some
people disabling temporarily auto compaction during bootstrap because the
new node cannot keep up with compaction.

 But before disabling auto compaction I would advise to play with the
streaming_throughput first. Lowering this threshold will give room for the
new node to compact but will make the joining process last longer.
Trade-off, as always.

On Wed, Dec 17, 2014 at 1:32 AM, Donald Smith <
Donald.Smith@audiencescience.com> wrote:

>  Looking at the output of "nodetool netstats" I see that the
> bootstrapping nodes pulling from only two of the nine nodes currently in
> the datacenter.   That surprises me: I'd think the vnodes it pulls from
> would be randomly spread across the existing nodes.  We’re using Cassandra
> 2.0.11 with 256 vnodes each.
>
>
>
> I also notice that while bootstrapping, the node is quite busy doing
> compactions.   There are over 1000 pending compactions on the new node and
> it’s not finished bootstrapping. I’d think those would be unnecessary,
> since the other nodes in the data center have zero pending compactions.
> Perhaps the compactions explains why running “du –hs
> /var/lib/cassandra/data” on the new node shows more disk space usage than
> on the old nodes.
>
>
>
> *Is it reasonable to do “nodetool disableautocompaction” on the
> bootstrapping node? Should that be the default???*
>
>
>
> If I start bootstrapping one node, it's not yet in the cluster but it
> decides which token ranges it owns and requests streams for that data.
> If  I then try to bootstrap a SECOND node concurrently, it will take over
> ownership of some token ranges from the first node. Will the first node
> then adjust what data it streams?
>
>
>
> It seems to me the cassandra server needs to keep track of both the OLD
> token ranges and vnodes and the NEW ones.  I’m not convinced that running
> two bootstraps concurrently (starting the second one after several minutes
> of delay) is safe.
>
>
>
> Thanks, Don
>
>
>
> *Donald A. Smith* | Senior Software Engineer
> P: 425.201.3900 x 3866
> C: (206) 819-5965
> F: (646) 443-2333
> donalds@AudienceScience.com
>
>
> [image: AudienceScience]
>
>
>

Mime
View raw message