Hi ,


My Compression strategy in Production was LZ4 Compression. But I modified it to Deflate 

For compression change, we had to use nodetool Upgradesstables to forcefully upgrade the compression strategy on all sstables

But once upgradesstabloes command completed on all the 5 nodes in the cluster, My requests started to fail, both read and write

Replication Factor - 3
Read Consistency - 1
Write Consistency - 1
FYI - I am also using lightweight transaction
Cassandra Version 3.10

I am now facing Following Errors in my debug.log file and some of my requests have started to fail :

Debug.log

ERROR [ReadRepairStage:82952] 2018-04-09 19:05:20,669 CassandraDaemon.java:229 - Exception in thread Thread[ReadRepairStage:82952,5,main]
org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses.
at org.apache.cassandra.service.DataResolver$RepairMergeListener.close(DataResolver.java:171) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$2.close(UnfilteredPartitionIterators.java:182) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:82) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:89) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCallback.java:50) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-3.10.jar:3.10]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_144]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_144]
at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) ~[apache-cassandra-3.10.jar:3.10]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_144]
DEBUG [ReadRepairStage:82953] 2018-04-09 19:05:22,932 ReadCallback.java:242 - Digest mismatch:
org.apache.cassandra.service.DigestMismatchException: Mismatch for key DecoratedKey(-2666936192316364820, 5756f5b8e7b341afa22cef22c5d33260) (d29a0e2a05f81315f0945dee5a210060 vs d41d8cd98f00b204e9800998ecf8427e)
at org.apache.cassandra.service.DigestResolver.compareResponses(DigestResolver.java:92) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:233) ~[apache-cassandra-3.10.jar:3.10]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_144]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_144]
at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) [apache-cassandra-3.10.jar:3.10]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_144]
INFO  [HintsDispatcher:767] 2018-04-09 19:05:24,874 HintsDispatchExecutor.java:283 - Finished hinted handoff of file 68c7c130-6cf8-4864-bde8-1819f238045c-1523315072851-1.hints to endpoint 68c7c130-6cf8-4864-bde8-1819f238045c, partially
DEBUG [ReadRepairStage:82950] 2018-04-09 19:05:24,932 DataResolver.java:169 - Timeout while read-repairing after receiving all 1 data and digest responses
ERROR [ReadRepairStage:82950] 2018-04-09 19:05:24,933 CassandraDaemon.java:229 - Exception in thread Thread[ReadRepairStage:82950,5,main]
org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses.
at org.apache.cassandra.service.DataResolver$RepairMergeListener.close(DataResolver.java:171) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$2.close(UnfilteredPartitionIterators.java:182) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:82) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:89) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCallback.java:50) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-3.10.jar:3.10]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_144]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_144]
at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) ~[apache-cassandra-3.10.jar:3.10]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_144]

nodetool info shows 

Gossip active          : true
Thrift active          : false
Native Transport active: true
Load                   : 280.43 GiB
Generation No          : 1514537104
Uptime (seconds)       : 8810363
Heap Memory (MB)       : 1252.06 / 3970.00
Off Heap Memory (MB)   : 573.33
Data Center            : dc1
Rack                   : rack1
Exceptions             : 18987
Key Cache              : entries 351612, size 99.86 MiB, capacity 100 MiB, 11144584 hits, 21126425 requests, 0.528 recent hit rate, 14400 save period in seconds


Out of 5 Nodes , a specififc node has a high no of Dropped Mutation "Around 560Kb" and Reads even though that node has same configuration as the other and owns equal amount of data.

We had tried to repair that node but That did not bring down the dropped mutation and the request kept failing.

We restarted the cassandra service on that node but the dropped mutation count increased to 70Bytes within an Hour on that node.


Hope anyone can help me with this. 


Thanks,
Hitesh dua
hiteshdua1@gmail.com