cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-6948) After Bootstrap or Replace node startup, EXPIRING_MAP_REAPER is shutdown and cannot be restarted, causing callbacks to collect indefinitely
Date Fri, 28 Mar 2014 18:37:16 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Benedict updated CASSANDRA-6948:
--------------------------------

    Attachment: 6948.txt

After reading CASSANDRA-3335 and CASSANDRA-3737, I cannot find a good justification for the
shutdown method:

# The executor service itself blocks nothing (it is independent of the callback registration,
only cleans up concurrently), and is on a daemon thread, so is completely harmless.
# The shutdown parameter which ostensibly prevents callbacks from being added both does not
work, and would be dangerous if it _did_ work, as any thread caught before a _restart_ would
block indefinitely, and having a thread restart from that position after a clearing of the
map seems problematic to me also; it seems much simpler to just let everyone be, since that's
what is happening already and seems to work fine.


> After Bootstrap or Replace node startup, EXPIRING_MAP_REAPER is shutdown and cannot be
restarted, causing callbacks to collect indefinitely
> -------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-6948
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6948
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Keith Wright
>            Assignee: Benedict
>         Attachments: 6948.debug.txt, 6948.txt, Screen Shot 2014-03-28 at 11.27.56 AM.png,
Screen Shot 2014-03-28 at 11.29.24 AM.png, cassandra.log.min, cassandra.yaml, logs.old.tar.gz,
logs.tar.gz, system.log.1.gz, system.log.gz
>
>
> Since ExpiringMap.shutdown() shuts down the static executor service, it cannot be restarted
(and in fact reset() makes no attempt to do so). As such callbacks that receive no response
are never removed from the map, and eventually either than server will run out of memory or
will loop around the integer space and start reusing messageids that have not been expired,
causing assertions to be thrown and messages to fail to be sent. It appears that this situation
only arises on bootstrap or node replacement, as MessagingService is shutdown before being
attached to the listen address.
> This can cause the following errors to begin occurring in the log:
> ERROR [Native-Transport-Requests:7636] 2014-03-28 13:32:10,638 ErrorMessage.java (line
222) Unexpected exception during request
> java.lang.AssertionError: Callback already exists for id -1665979622! (CallbackInfo(target=/10.106.160.84,
callback=org.apache.cassandra.service.WriteResponseHandler@5d36d8ea, serializer=org.apache.cassandra.db.WriteResponse$WriteResponseSerializer@6ed37f0b))
> 	at org.apache.cassandra.net.MessagingService.addCallback(MessagingService.java:549)
> 	at org.apache.cassandra.net.MessagingService.sendRR(MessagingService.java:601)
> 	at org.apache.cassandra.service.StorageProxy.mutateCounter(StorageProxy.java:984)
> 	at org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:449)
> 	at org.apache.cassandra.service.StorageProxy.mutateWithTriggers(StorageProxy.java:524)
> 	at org.apache.cassandra.cql3.statements.ModificationStatement.executeWithoutCondition(ModificationStatement.java:521)
> 	at org.apache.cassandra.cql3.statements.ModificationStatement.execute(ModificationStatement.java:505)
> 	at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:188)
> 	at org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:358)
> 	at org.apache.cassandra.transport.messages.ExecuteMessage.execute(ExecuteMessage.java:131)
> 	at org.apache.cassandra.transport.Message$Dispatcher.messageReceived(Message.java:304)
> 	at org.jboss.netty.handler.execution.ChannelUpstreamEventRunnable.doRun(ChannelUpstreamEventRunnable.java:43)
> 	at org.jboss.netty.handler.execution.ChannelEventRunnable.run(ChannelEventRunnable.java:67)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:744)
> ERROR [ReplicateOnWriteStage:102766] 2014-03-28 13:32:10,638 CassandraDaemon.java (line
196) Exception in thread Thread[ReplicateOnWriteStage:102766,5,main]
> java.lang.AssertionError: Callback already exists for id -1665979620! (CallbackInfo(target=/10.106.160.84,
callback=org.apache.cassandra.service.WriteResponseHandler@3bdb1a75, serializer=org.apache.cassandra.db.WriteResponse$WriteResponseSerializer@6ed37f0b))
> 	at org.apache.cassandra.net.MessagingService.addCallback(MessagingService.java:549)
> 	at org.apache.cassandra.net.MessagingService.sendRR(MessagingService.java:601)
> 	at org.apache.cassandra.service.StorageProxy.sendToHintedEndpoints(StorageProxy.java:806)
> 	at org.apache.cassandra.service.StorageProxy$8$1.runMayThrow(StorageProxy.java:1074)
> 	at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1896)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:744)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message