=> Adding a new node between other nodes would avoid running move, but the ring would be unbalanced, right? Would this imply in having a node (with bigger range, 1/2 of the range while other 2 nodes have 1/2 each, supposing 3 nodes) overloaded? I'm refering http://wiki.apache.org/cassandra/Operations#Load_balancing
    Yes, if you're using a single vnode per server, or are running an older version of Cassandra.  For lowest impact, doubling the size of your cluster is recommended so that you can avoid doing moves.  Or if you're on Cassandra 1.2+, you can use vnodes, and you should not typically need to rebalance after bringing a new server online.


    On Tue, Jul 9, 2013 at 9:31 PM, Rodrigo Felix <rodrigofelixdealmeida@gmail.com> wrote:
    Thank you very much for you response. Follows my comments about your email.

    Att.

    Rodrigo Felix de Almeida
    LSBD - Universidade Federal do Ceará
    Project Manager
    MBA, CSM, CSPO, SCJP


    On Mon, Jul 8, 2013 at 6:05 PM, Robert Coli <rcoli@eventbrite.com> wrote:
    On Sat, Jul 6, 2013 at 1:50 PM, Rodrigo Felix <rodrigofelixdealmeida@gmail.com> wrote:
    • Is it normal to take about 9 minutes to add a new node? Follows the log generated by a script to add a new node.
    Sure.  => OK
    • Is there a way to reduce the time to start cassandra?
    Not usually. => OK
    • Sometimes cleanup operation takes make minutes (about 10). Is this normal since the amount of data is small (1.7gb at maximum / seed)?
    Compaction is throttled, and cleanup is a type of compaction. Bootstrap is also throttled via the streaming throttle. => OK
    • Considering that I have two seeds in the beginning, their tokens are 0 and 85070591730234615865843651857942052864. When I add a new machine, do I need to execute move and cleanup on both seeds? Nowadays, I'm running cleanup on seed 0, move + cleanup on the other seed and neither move nor cleanup on the just added node. Is this OK?
    Only nodes which have "lost" ranges need to run cleanup. In general you should add new nodes "between" other nodes such that "move" is not required at all. 

    => Adding a new node between other nodes would avoid running move, but the ring would be unbalanced, right? Would this imply in having a node (with bigger range, 1/2 of the range while other 2 nodes have 1/2 each, supposing 3 nodes) overloaded? I'm refering http://wiki.apache.org/cassandra/Operations#Load_balancing
    • What if I do not run cleanup in any existing node when adding or removing a node? Is the data that was not "cleaned up" still available if I send a scan, for instance, and the scan range is still in the node but it wouldn't be there if I had run cleanup? Data would be gather from other node, ie. the one that properly has the range specified in the scan query?
    If data for range [x] is on node [a] but node [a] is no longer considered an endpoint for range [x], it will never receive a request to serve range [x]. => OK
    • After decommissioning a node, is it advisable to run cleanup in the remaining nodes? The consequences of not to run are the same of not to run when adding a node?
    Cleanup is only for the node which lost a range. In decommission case, no live nodes lost a range, only some nodes gained one. => OK

    =Rob