On Sat, Jul 6, 2013 at 1:50 PM, Rodrigo Felix <rodrigofelixdealmeida@gmail.com> wrote:
  • Is it normal to take about 9 minutes to add a new node? Follows the log generated by a script to add a new node.
  • Is there a way to reduce the time to start cassandra?
Not usually. 
  • Sometimes cleanup operation takes make minutes (about 10). Is this normal since the amount of data is small (1.7gb at maximum / seed)?
Compaction is throttled, and cleanup is a type of compaction. Bootstrap is also throttled via the streaming throttle.
  • Considering that I have two seeds in the beginning, their tokens are 0 and 85070591730234615865843651857942052864. When I add a new machine, do I need to execute move and cleanup on both seeds? Nowadays, I'm running cleanup on seed 0, move + cleanup on the other seed and neither move nor cleanup on the just added node. Is this OK?
Only nodes which have "lost" ranges need to run cleanup. In general you should add new nodes "between" other nodes such that "move" is not required at all. 
  • What if I do not run cleanup in any existing node when adding or removing a node? Is the data that was not "cleaned up" still available if I send a scan, for instance, and the scan range is still in the node but it wouldn't be there if I had run cleanup? Data would be gather from other node, ie. the one that properly has the range specified in the scan query?
If data for range [x] is on node [a] but node [a] is no longer considered an endpoint for range [x], it will never receive a request to serve range [x]. 
  • After decommissioning a node, is it advisable to run cleanup in the remaining nodes? The consequences of not to run are the same of not to run when adding a node?
Cleanup is only for the node which lost a range. In decommission case, no live nodes lost a range, only some nodes gained one.