I have a 4 node cluster setup in 2 zones with NetworkTopology strategy and strategy options for writing a copy to each zone, so the effective load on each machine is 50%.
I have a column family that has gc grace seconds of 10 days (the default). On 17th there was an insert done to this column family and from our application logs I can see that the client got a successful response back with write consistency of ONE. I can verify the existence of the key that was inserted in Commitlogs of both replicas however it seams that this record was never inserted. I used list to get all the column family rows which were about 800ish, and examine them to see if it could possibly be deleted by our application. List should have shown them to me since I have not gone beyond gc grace seconds if this record was deleted during past days. I could not find it.
During the same time as this insert was happening, I was performing a rolling upgrade of Cassandra from 1.1.3 to 1.1.5 by taking one node down at a time, performing the package upgrade and restarting the service and going to the next node. I could see from system.log that some mutations were replayed during those restarts, so I suppose the memtables were not flushed before restart.
Could this procedure cause the row inser to disappear? How could I troubleshoot as I am running out of ideas.
Your help is greatly appreciated.