cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint
Date Mon, 07 Dec 2015 09:01:11 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044588#comment-15044588
] 

Sylvain Lebresne commented on CASSANDRA-10477:
----------------------------------------------

bq. The assertion doesn't care if hints are disabled along with several of the other things
that are added.

First, I still don't understand why it's not consistent between 2.1 and 3.0. As far as I can
tell, the {{WriteCallbackInfo.shouldHint()}} mostly method calls {{StorageProxy.shouldHint()}}
which does pretty much the same thing in both versions.  Second, I'd argue the assertion _must_
use {{!shouldHint()}} because what we're trying to assert is that {{submitHint}} is never
called for localhost on the expiration of a callback, and that depends on the result of {{shouldHint()}}.
That said, I think it would almost be better to have the assertion just be {{!target.equals(FBUtilities.getBroadcastAddress())}}
as we're basically saying a local write should always use the specific local path, not {{MessagingService}}.
In any case, I think the assertion is worth a quick comment to explain why we're asserting
that here.

The rest of the changes lgtm, but the unit tests on 3.0 don't seem to have run due to some
problem with an {{@Override}}.

bq. and prognosticate on how I want to test OE

The lack of coverage of OE is certainly something we should fix (it's not trivial though),
but I would suggest not blocking that fix for that since it's not directly related (meaning,
we should probably open a separate ticket for it).


> java.lang.AssertionError in StorageProxy.submitHint
> ---------------------------------------------------
>
>                 Key: CASSANDRA-10477
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10477
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local Write-Read Paths
>         Environment: CentOS 6, Oracle JVM 1.8.45
>            Reporter: Severin Leonhardt
>            Assignee: Ariel Weisberg
>             Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
>
> A few days after updating from 2.0.15 to 2.1.9 we have the following log entry on 2 of
5 machines:
> {noformat}
> ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 CassandraDaemon.java:223 - Exception
in thread Thread[EXPIRING-MAP-REAPER:1,5,main]
> java.lang.AssertionError: /192.168.11.88
>         at org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949)
~[apache-cassandra-2.1.9.jar:2.1.9]
>         at org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383)
~[apache-cassandra-2.1.9.jar:2.1.9]
>         at org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363)
~[apache-cassandra-2.1.9.jar:2.1.9]
>         at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) ~[apache-cassandra-2.1.9.jar:2.1.9]
>         at org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
~[apache-cassandra-2.1.9.jar:2.1.9]
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_45]
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_45]
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
[na:1.8.0_45]
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
[na:1.8.0_45]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[na:1.8.0_45]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_45]
>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> {noformat}
> 192.168.11.88 is the broadcast address of the local machine.
> When this is logged the read request latency of the whole cluster becomes very bad, from
6 ms/op to more than 100 ms/op according to OpsCenter. Clients get a lot of timeouts. We need
to restart the affected Cassandra node to get back normal read latencies. It seems write latency
is not affected.
> Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the assert from
being logged. At some point the read latency becomes bad again. Restarting the node where
hinted handoff was disabled results in the read latency being better again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message