activemq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Robinson (JIRA)" <>
Subject [jira] [Commented] (AMQ-5082) ActiveMQ replicatedLevelDB cluster breaks, all nodes stop listening
Date Wed, 01 Apr 2015 00:17:54 GMT


Jim Robinson commented on AMQ-5082:

It's not at all clear to me how master_stopped is sometimes true
and sometimes false.  I'm wondering if there's a section of the
code that I'm missing where the broker is discarded and then
recreated on failure.

I'm running a round of tests after having made a change to
reset master_stopped after master.start():

diff --git a/activemq-leveldb-store/src/main/scala/org/apache/activemq/leveldb/replicated/ElectingLevelDBStore.scala
index 331d06b..a47baab 100644
--- a/activemq-leveldb-store/src/main/scala/org/apache/activemq/leveldb/replicated/ElectingLevelDBStore.scala
+++ b/activemq-leveldb-store/src/main/scala/org/apache/activemq/leveldb/replicated/ElectingLevelDBStore.scala
@@ -228,6 +228,7 @@ class ElectingLevelDBStore extends ProxyLevelDBStore {
+      master_stopped.set(false)

> ActiveMQ replicatedLevelDB cluster breaks, all nodes stop listening
> -------------------------------------------------------------------
>                 Key: AMQ-5082
>                 URL:
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: activemq-leveldb-store
>    Affects Versions: 5.9.0, 5.10.0
>            Reporter: Scott Feldstein
>            Assignee: Christian Posta
>            Priority: Critical
>             Fix For: 5.12.0
>         Attachments: 03-07.tgz, amq_5082_threads.tar.gz, mq-node1-cluster.failure, mq-node2-cluster.failure,
mq-node3-cluster.failure, zookeeper.out-cluster.failure
> I have a 3 node amq cluster and one zookeeper node using a replicatedLevelDB persistence
> {code}
>         <persistenceAdapter>
>             <replicatedLevelDB
>               directory="${}/leveldb"
>               replicas="3"
>               bind="tcp://"
>               zkAddress="zookeep0:2181"
>               zkPath="/activemq/leveldb-stores"/>
>         </persistenceAdapter>
> {code}
> After about a day or so of sitting idle there are cascading failures and the cluster
completely stops listening all together.
> I can reproduce this consistently on 5.9 and the latest 5.10 (commit 2360fb859694bacac1e48092e53a56b388e1d2f0).
 I am going to attach logs from the three mq nodes and the zookeeper logs that reflect the
time where the cluster starts having issues.
> The cluster stops listening Mar 4, 2014 4:56:50 AM (within 5 seconds).
> The OSs are all centos 5.9 on one esx server, so I doubt networking is an issue.
> If you need more data it should be pretty easy to get whatever is needed since it is
consistently reproducible.
> This bug may be related to AMQ-5026, but looks different enough to file a separate issue.

This message was sent by Atlassian JIRA

View raw message