cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Blake Eggleston (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint
Date Wed, 18 Nov 2015 18:56:11 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15011675#comment-15011675
] 

Blake Eggleston commented on CASSANDRA-10477:
---------------------------------------------

It seems like adding a paxos commit equivalent of StorageProxy.insertLocal, and submitting
local commits that way would be the safest thing to do here. In theory, you should be able
to add a check against the local address to StorageProxy.shouldHint and just drop the commit
message if the node is overloaded, it should get back up to speed on the next paxos round.
However there may be subtleties and edge cases that I'm not thinking of, so I don't want to
recommend that without giving this more thought.

> java.lang.AssertionError in StorageProxy.submitHint
> ---------------------------------------------------
>
>                 Key: CASSANDRA-10477
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10477
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local Write-Read Paths
>         Environment: CentOS 6, Oracle JVM 1.8.45
>            Reporter: Severin Leonhardt
>            Assignee: Ariel Weisberg
>             Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
>
> A few days after updating from 2.0.15 to 2.1.9 we have the following log entry on 2 of
5 machines:
> {noformat}
> ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 CassandraDaemon.java:223 - Exception
in thread Thread[EXPIRING-MAP-REAPER:1,5,main]
> java.lang.AssertionError: /192.168.11.88
>         at org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949)
~[apache-cassandra-2.1.9.jar:2.1.9]
>         at org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383)
~[apache-cassandra-2.1.9.jar:2.1.9]
>         at org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363)
~[apache-cassandra-2.1.9.jar:2.1.9]
>         at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) ~[apache-cassandra-2.1.9.jar:2.1.9]
>         at org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
~[apache-cassandra-2.1.9.jar:2.1.9]
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_45]
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_45]
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
[na:1.8.0_45]
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
[na:1.8.0_45]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[na:1.8.0_45]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_45]
>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> {noformat}
> 192.168.11.88 is the broadcast address of the local machine.
> When this is logged the read request latency of the whole cluster becomes very bad, from
6 ms/op to more than 100 ms/op according to OpsCenter. Clients get a lot of timeouts. We need
to restart the affected Cassandra node to get back normal read latencies. It seems write latency
is not affected.
> Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the assert from
being logged. At some point the read latency becomes bad again. Restarting the node where
hinted handoff was disabled results in the read latency being better again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message