cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fay Hou [Storage Service] ­ <fay...@coupang.com>
Subject Re: Cassandra 3.11 is compacting forever
Date Fri, 01 Sep 2017 22:54:23 GMT
try to do a rolling restart for the cluster before doing a compation

On Fri, Sep 1, 2017 at 3:09 PM, Igor Leão <igor.leao@ubee.in> wrote:

> Some generic errors:
>
> *[aladdin@ip-172-16-1-10 cassandra]$ tail cassandra.log | grep -i error*
> *[aladdin@ip-172-16-1-10 cassandra]$ tail cassandra.log | grep -i excep*
> *[aladdin@ip-172-16-1-10 cassandra]$ tail cassandra.log | grep -i fail*
> *[aladdin@ip-172-16-1-10 cassandra]$ tail debug.log | grep -i error*
> *[aladdin@ip-172-16-1-10 cassandra]$ tail debug.log | grep -i exce*
> *[aladdin@ip-172-16-1-10 cassandra]$ tail debug.log | grep -i fail*
> *DEBUG [GossipStage:1] 2017-09-01 15:33:27,046 FailureDetector.java:457 -
> Ignoring interval time of 2108299431 <(210)%20829-9431> for /172.16.1.112
> <http://172.16.1.112/>*
> *DEBUG [GossipStage:1] 2017-09-01 15:33:29,051 FailureDetector.java:457 -
> Ignoring interval time of 2005507384 for /172.16.1.74 <http://172.16.1.74/>*
> *DEBUG [GossipStage:1] 2017-09-01 15:33:45,968 FailureDetector.java:457 -
> Ignoring interval time of 2003371497 for /172.16.1.74 <http://172.16.1.74/>*
> *DEBUG [GossipStage:1] 2017-09-01 15:33:51,133 FailureDetector.java:457 -
> Ignoring interval time of 2013260173 <(201)%20326-0173> for /172.16.1.74
> <http://172.16.1.74/>*
> *DEBUG [GossipStage:1] 2017-09-01 15:33:58,981 FailureDetector.java:457 -
> Ignoring interval time of 2009620081 for /172.16.1.112
> <http://172.16.1.112/>*
> *DEBUG [GossipStage:1] 2017-09-01 15:34:19,235 FailureDetector.java:457 -
> Ignoring interval time of 2010956256 for /172.16.1.74 <http://172.16.1.74/>*
> *DEBUG [GossipStage:1] 2017-09-01 15:34:19,235 FailureDetector.java:457 -
> Ignoring interval time of 2011127930 for /10.0.1.122 <http://10.0.1.122/>*
> *[aladdin@ip-172-16-1-10 cassandra]$ tail system.log | grep -i error*
> *io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)()
> failed: Connection reset by peer*
> *[aladdin@ip-172-16-1-10 cassandra]$ tail system.log | grep -i exce*
> *INFO  [Native-Transport-Requests-5] 2017-09-01 15:22:58,806
> Message.java:619 - Unexpected exception during request; channel = [id:
> 0xdd63db2f, L:/10.0.1.47:9042 <http://10.0.1.47:9042/> !
> R:/10.0.44.196:41422 <http://10.0.44.196:41422/>]*
> *io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)()
> failed: Connection reset by peer*
> *[aladdin@ip-172-16-1-10 cassandra]$ tail system.log | grep -i fail*
> *io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)()
> failed: Connection reset by peer*
>
>
> Some interesting errors:
>
> 1.
> *DEBUG [ReadRepairStage:1] 2017-09-01 15:34:58,485 ReadCallback.java:242 -
> Digest mismatch:*
> *org.apache.cassandra.service.DigestMismatchException: Mismatch for key
> DecoratedKey(5988282114260523734,
> 32623331326162652d633533332d343237632d626334322d306466643762653836343830)
> (023d99bbcf2263f0fa450c2312fdce88 vs a60ba37a46e0a61227a8b560fa4e0dfb)*
> * at
> org.apache.cassandra.service.DigestResolver.compareResponses(DigestResolver.java:92)
> ~[apache-cassandra-3.11.0.jar:3.11.0]*
> * at
> org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:233)
> ~[apache-cassandra-3.11.0.jar:3.11.0]*
> * at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [na:1.8.0_112]*
> * at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [na:1.8.0_112]*
> * at
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
> [apache-cassandra-3.11.0.jar:3.11.0]*
> * at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_112]*
>
> 2.
> *INFO  [Native-Transport-Requests-5] 2017-09-01 15:22:58,806
> Message.java:619 - Unexpected exception during request; channel = [id:
> 0xdd63db2f, L:/10.0.1.47:9042 <http://10.0.1.47:9042/> !
> R:/10.0.44.196:41422 <http://10.0.44.196:41422/>]*
> *io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)()
> failed: Connection reset by peer*
> * at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown Source)
> ~[netty-all-4.0.44.Final.jar:4.0.44.Final]*
> *INFO  [Native-Transport-Requests-11] 2017-09-01 15:31:42,722
> NoSpamLogger.java:91 - Maximum memory usage reached (512.000MiB), cannot
> allocate chunk of 1.000MiB*
>
> *INFO  [CompactionExecutor:470] 2017-09-01 10:16:42,026
> NoSpamLogger.java:91 - Maximum memory usage reached (512.000MiB), cannot
> allocate chunk of 1.000MiB*
> *INFO  [CompactionExecutor:475] 2017-09-01 10:31:42,032
> NoSpamLogger.java:91 - Maximum memory usage reached (512.000MiB), cannot
> allocate chunk of 1.000MiB*
> *INFO  [CompactionExecutor:478] 2017-09-01 10:46:42,108
> NoSpamLogger.java:91 - Maximum memory usage reached (512.000MiB), cannot
> allocate chunk of 1.000MiB*
> *INFO  [CompactionExecutor:482] 2017-09-01 11:01:42,131
> NoSpamLogger.java:91 - Maximum memory usage reached (512.000MiB), cannot
> allocate chunk of 1.000MiB*
>
> About this last error, I tried to increase `file_cache_size_in_mb` of
> this node to 2048, but the error only changed to
> *INFO  [ReadStage-2] 2017-09-01 16:18:38,657 NoSpamLogger.java:91 -
> Maximum memory usage reached (2.000GiB), cannot allocate chunk of 1.000MiB*
>
> 2017-09-01 9:07 GMT-03:00 kurt greaves <kurt@instaclustr.com>:
>
>> are you seeing any errors in the logs? Is that one compaction still
>> getting stuck?
>>
>
>
>
>
>
>

Mime
View raw message