We are seeing an issue where data that was written to the cluster is no longer accessible after trying to expand the size of the cluster. I will try and provide as much information as possible, I am just starting at with Cassandra and I'm not entirely sure what data is relevant.
All Cassandra nodes are 1.2.5, and each node has the same config.
We started out moving our entire data set to a single cassandra node. This node was initially set up with Initial Token : 0, as well as other default settings. After we had gotten all the data moved over we decided to add 2 more nodes, as well as up the RF to 2. We also decided to start using vnodes which meant setting num_tokens to 256 and removing the initial token param. We then decided to run cassandra-shuffle as well.
During cassandra-shuffle we started to notice some rows were disappearing then reappearing, and other rows haven't come back at all. I decided to stop the shuffle and repair each node then restart the cluster, however all the data hasn't come back. Note that this is CONSISTENCY ALL
Here is my `nodetool status` What is weird here is the token distribution 260-239-1. I'm not an expert but I believe it should be 256-256-256, or at least add up to 768.
-- Address Load Tokens Owns Host ID Rack
UN 10.129.196.4 371.56 GB 260 38.1% cde6c3be-a066-47f2-abc2-b1d78bee0d7c 196
UN 10.129.196.5 212.64 GB 239 61.5% 2cb24510-2f89-46b2-96b9-873f8e8e50da 196
UN 10.129.196.6 256.05 GB 1 0.4% ce8d4ea9-8106-44b3-a2dd-c0230eb53c94 196
What also weird is the token count from nodetool -h [host] info differs from status.
root@cass1:~# nodetool -h cass1 info | grep Token
Token : (invoke with -T/--tokens to see all 239 tokens)
root@cass1:~# nodetool -h cass2 info | grep Token
Token : (invoke with -T/--tokens to see all 269 tokens)
root@cass1:~# nodetool -h cass3 info | grep Token
Token : (invoke with -T/--tokens to see all 260 tokens)
I believe it has something to do with the cluster not "seeing" all the tokens, but I am not sure where to continue from here. I don't believe any data was lost there was no power outage, and all the data should have been committed to disk before we added the two other nodes.