cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From hitesh dua <hiteshd...@gmail.com>
Subject Single Node Timeout Error and High Dropped Mutation after Upgradesstables
Date Wed, 11 Apr 2018 11:03:16 GMT
 Hi ,


My Compression strategy in Production was *LZ4 Compression. *But I modified
it to Deflate

For compression change, we had to use *nodetool Upgradesstables *to
forcefully upgrade the compression strategy on all sstables

But once upgradesstabloes command completed on all the 5 nodes in the
cluster, My requests started to fail, both read and write

Replication Factor - 3
Read Consistency - 1
Write Consistency - 1
FYI - I am also using lightweight transaction
Cassandra Version 3.10

I am now facing Following Errors in my debug.log file and some of my
requests have started to fail :

Debug.log

ERROR [ReadRepairStage:82952] 2018-04-09 19:05:20,669
>> CassandraDaemon.java:229 - Exception in thread
>> Thread[ReadRepairStage:82952,5,main]
>
> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out
>> - received only 0 responses.
>
> at org.apache.cassandra.service.DataResolver$RepairMergeListener.close(DataResolver.java:171)
>> ~[apache-cassandra-3.10.jar:3.10]
>
> at org.apache.cassandra.db.partitions.UnfilteredPartitionIterat
>> ors$2.close(UnfilteredPartitionIterators.java:182)
>> ~[apache-cassandra-3.10.jar:3.10]
>
> at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:82)
>> ~[apache-cassandra-3.10.jar:3.10]
>
> at org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:89)
>> ~[apache-cassandra-3.10.jar:3.10]
>
> at org.apache.cassandra.service.AsyncRepairCallback$1.runMayThr
>> ow(AsyncRepairCallback.java:50) ~[apache-cassandra-3.10.jar:3.10]
>
> at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>> ~[apache-cassandra-3.10.jar:3.10]
>
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>> ~[na:1.8.0_144]
>
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> ~[na:1.8.0_144]
>
> at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$th
>> readLocalDeallocator$0(NamedThreadFactory.java:79)
>> ~[apache-cassandra-3.10.jar:3.10]
>
> at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_144]
>
> DEBUG [ReadRepairStage:82953] 2018-04-09 19:05:22,932
>> ReadCallback.java:242 - Digest mismatch:
>
> org.apache.cassandra.service.DigestMismatchException: Mismatch for key
>> DecoratedKey(-2666936192316364820, 5756f5b8e7b341afa22cef22c5d33260)
>> (d29a0e2a05f81315f0945dee5a210060 vs d41d8cd98f00b204e9800998ecf8427e)
>
> at org.apache.cassandra.service.DigestResolver.compareResponses(DigestResolver.java:92)
>> ~[apache-cassandra-3.10.jar:3.10]
>
> at org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:233)
>> ~[apache-cassandra-3.10.jar:3.10]
>
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>> [na:1.8.0_144]
>
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> [na:1.8.0_144]
>
> at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$th
>> readLocalDeallocator$0(NamedThreadFactory.java:79)
>> [apache-cassandra-3.10.jar:3.10]
>
> at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_144]
>
> INFO  [HintsDispatcher:767] 2018-04-09 19:05:24,874
>> HintsDispatchExecutor.java:283 - Finished hinted handoff of file
>> 68c7c130-6cf8-4864-bde8-1819f238045c-1523315072851-1.hints to endpoint
>> 68c7c130-6cf8-4864-bde8-1819f238045c, partially
>
> DEBUG [ReadRepairStage:82950] 2018-04-09 19:05:24,932
>> DataResolver.java:169 - Timeout while read-repairing after receiving all 1
>> data and digest responses
>
> ERROR [ReadRepairStage:82950] 2018-04-09 19:05:24,933
>> CassandraDaemon.java:229 - Exception in thread
>> Thread[ReadRepairStage:82950,5,main]
>
> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out
>> - received only 0 responses.
>
> at org.apache.cassandra.service.DataResolver$RepairMergeListener.close(DataResolver.java:171)
>> ~[apache-cassandra-3.10.jar:3.10]
>
> at org.apache.cassandra.db.partitions.UnfilteredPartitionIterat
>> ors$2.close(UnfilteredPartitionIterators.java:182)
>> ~[apache-cassandra-3.10.jar:3.10]
>
> at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:82)
>> ~[apache-cassandra-3.10.jar:3.10]
>
> at org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:89)
>> ~[apache-cassandra-3.10.jar:3.10]
>
> at org.apache.cassandra.service.AsyncRepairCallback$1.runMayThr
>> ow(AsyncRepairCallback.java:50) ~[apache-cassandra-3.10.jar:3.10]
>
> at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>> ~[apache-cassandra-3.10.jar:3.10]
>
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>> ~[na:1.8.0_144]
>
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> ~[na:1.8.0_144]
>
> at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$th
>> readLocalDeallocator$0(NamedThreadFactory.java:79)
>> ~[apache-cassandra-3.10.jar:3.10]
>
> at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_144]
>
>
nodetool info shows

Gossip active          : true
>
> Thrift active          : false
>
> Native Transport active: true
>
> Load                   : 280.43 GiB
>
> Generation No          : 1514537104
>
> Uptime (seconds)       : 8810363
>
> Heap Memory (MB)       : 1252.06 / 3970.00
>
> Off Heap Memory (MB)   : 573.33
>
> Data Center            : dc1
>
> Rack                   : rack1
>
> *Exceptions             : 18987*
>
> Key Cache              : entries 351612, size 99.86 MiB, capacity 100 MiB,
>> 11144584 hits, 21126425 requests, 0.528 recent hit rate, 14400 save period
>> in seconds
>
>

Out of 5 Nodes , a specififc node has a high no of Dropped Mutation "Around
560Kb" and Reads even though that node has same configuration as the other
and owns equal amount of data.

We had tried to repair that node but That did not bring down the dropped
mutation and the request kept failing.

We restarted the cassandra service on that node but the dropped mutation
count increased to 70Bytes within an Hour on that node.


Hope anyone can help me with this.


Thanks,
Hitesh dua
hiteshdua1@gmail.com

Mime
View raw message