cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rodrigo Felix <>
Subject Re: General doubts about bootstrap
Date Wed, 10 Jul 2013 01:31:57 GMT
Thank you very much for you response. Follows my comments about your email.


*Rodrigo Felix de Almeida*
LSBD - Universidade Federal do CearĂ¡
Project Manager

On Mon, Jul 8, 2013 at 6:05 PM, Robert Coli <> wrote:

> On Sat, Jul 6, 2013 at 1:50 PM, Rodrigo Felix <
>> wrote:
>>    - Is it normal to take about 9 minutes to add a new node? Follows the
>>    log generated by a script to add a new node.
>> Sure.  => OK
>>    - Is there a way to reduce the time to start cassandra?
>> Not usually. => OK
>>    - Sometimes cleanup operation takes make minutes (about 10). Is this
>>    normal since the amount of data is small (1.7gb at maximum / seed)?
>> Compaction is throttled, and cleanup is a type of compaction. Bootstrap
> is also throttled via the streaming throttle. => OK
>>    - Considering that I have two seeds in the beginning, their tokens
>>    are 0 and 85070591730234615865843651857942052864. When I add a new machine,
>>    do I need to execute move and cleanup on both seeds? Nowadays, I'm running
>>    cleanup on seed 0, move + cleanup on the other seed and neither move nor
>>    cleanup on the just added node. Is this OK?
>> Only nodes which have "lost" ranges need to run cleanup. In general you
> should add new nodes "between" other nodes such that "move" is not required
> at all.

=> Adding a new node between other nodes would avoid running move, but the
ring would be unbalanced, right? Would this imply in having a node (with
bigger range, 1/2 of the range while other 2 nodes have 1/2 each, supposing
3 nodes) overloaded? I'm refering

>>    - What if I do not run cleanup in any existing node when adding or
>>    removing a node? Is the data that was not "cleaned up" still available if I
>>    send a scan, for instance, and the scan range is still in the node but it
>>    wouldn't be there if I had run cleanup? Data would be gather from other
>>    node, ie. the one that properly has the range specified in the scan query?
>> If data for range [x] is on node [a] but node [a] is no longer considered
> an endpoint for range [x], it will never receive a request to serve range
> [x]. => OK
>>    - After decommissioning a node, is it advisable to run cleanup in the
>>    remaining nodes? The consequences of not to run are the same of not to run
>>    when adding a node?
>> Cleanup is only for the node which lost a range. In decommission case, no
> live nodes lost a range, only some nodes gained one. => OK
> =Rob

View raw message