"This looks like the counters were more out of sync before the upgrade than after?"
My guess is the update makes some counters over-count since I saw the value of the sum of our daily counter increase by 2000 after each restart at the exact moment that the node is marked as being up. This counter was about 5000, after upgrading the first node 7000, after the second upgrade 9000 and finally 11000. This value shouldn't have increase since we had stopped our storm topology and we were queuing events without writing directly into C*.
"Do you know if your client is retrying counter operations ? (I saw some dropped messages in the S1 log). "
I am using phpCassa written by Tyler Hobbs. I have a pool configured like this:
$max_retries = 4
$send_timeout = 2000
$recv_timeout = 2000
$pool_size = NULL //max(5, count($servers) * 2)
$pool = new ConnectionPool($base, $servers, $pool_size, $max_retries, $send_timeout, $recv_timeout);
So I guess my client is retrying count operations like any other operation, 4 times max with 2 sec time out.
"S1 shows a lot of Commit Log replay going on. Reading your timeline below this sounds like the auto restart catching you out."
Is there a way to remove this auto-restart while upgrading from the Datastax repository on Ubuntu ?
Let me know if you need something more to understand what happened.