activemq-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antonio Terreno <antonio.terr...@gmail.com>
Subject Replicated LevelDB Store not working
Date Wed, 30 Oct 2013 10:12:07 GMT
Hi all,
we are having some troubles setting up a cluster of three nodes of ActiveMQ
v.5.9.0 with LevelDB and Zookeeper, as described in this page:
http://activemq.apache.org/replicated-leveldb-store.html.

The configuration of the 3 servers is in this gist:
https://gist.github.com/aterreno/7229464. (the only difference between the
3 instances is the hostname, we use the IP of the host)

First of all, the performance compared with a local Kaha DB store (jms
failover url with no replica) seem less performant, but more importantly,
whenever we try to take down the master, with the command bin/activemq stop
the slaves that tries to become master gets IOExpections from levelDB.

Most of the times we get this:

2013-10-30 10:06:24,395 | INFO  | No IOExceptionHandler registered,
ignoring IO exception | org.apache.activemq.broker.BrokerService | LevelDB
IOException handler.
java.io.IOException: Could not open table 5
        at
org.apache.activemq.util.IOExceptionSupport.create(IOExceptionSupport.java:39)
        at
org.apache.activemq.leveldb.LevelDBClient.might_fail(LevelDBClient.scala:543)
        at
org.apache.activemq.leveldb.LevelDBClient.might_fail_using_index(LevelDBClient.scala:974)
        at
org.apache.activemq.leveldb.LevelDBClient.listCollections(LevelDBClient.scala:1092)
        at
org.apache.activemq.leveldb.DBManager$$anonfun$3.apply(DBManager.scala:808)
        at
org.apache.activemq.leveldb.DBManager$$anonfun$3.apply(DBManager.scala:808)
        at
org.fusesource.hawtdispatch.package$RichExecutorTrait$$anonfun$future$1.apply$mcV$sp(hawtdispatch.scala:117)
        at
org.fusesource.hawtdispatch.package$$anon$4.run(hawtdispatch.scala:357)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.RuntimeException: Could not open table 5
        at org.iq80.leveldb.impl.TableCache.getTable(TableCache.java:87)
        at org.iq80.leveldb.impl.TableCache.newIterator(TableCache.java:69)
        at org.iq80.leveldb.impl.TableCache.newIterator(TableCache.java:64)
        at org.iq80.leveldb.impl.Version.getLevel0Files(Version.java:139)
        at org.iq80.leveldb.impl.DbImpl.internalIterator(DbImpl.java:757)
        at org.iq80.leveldb.impl.DbImpl.iterator(DbImpl.java:722)
        at org.iq80.leveldb.impl.DbImpl.iterator(DbImpl.java:83)
        at
org.apache.activemq.leveldb.LevelDBClient$RichDB.cursorPrefixed(LevelDBClient.scala:273)
        at
org.apache.activemq.leveldb.LevelDBClient$$anonfun$listCollections$1.apply$mcV$sp(LevelDBClient.scala:1096)
        at
org.apache.activemq.leveldb.LevelDBClient$$anonfun$listCollections$1.apply(LevelDBClient.scala:1092)
        at
org.apache.activemq.leveldb.LevelDBClient$$anonfun$listCollections$1.apply(LevelDBClient.scala:1092)
        at
org.apache.activemq.leveldb.LevelDBClient.usingIndex(LevelDBClient.scala:968)
        at
org.apache.activemq.leveldb.LevelDBClient$$anonfun$might_fail_using_index$1.apply(LevelDBClient.scala:974)
        at
org.apache.activemq.leveldb.LevelDBClient.might_fail(LevelDBClient.scala:540)
        ... 9 more
Caused by: java.io.FileNotFoundException:
/root/apache-activemq-5.9.0/data/dirty.index/000005.sst (No such file or
directory)
        at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.<init>(FileInputStream.java:138)
        at
org.iq80.leveldb.impl.TableCache$TableAndFile.<init>(TableCache.java:112)
        at
org.iq80.leveldb.impl.TableCache$TableAndFile.<init>(TableCache.java:102)
        at org.iq80.leveldb.impl.TableCache$1.load(TableCache.java:57)
        at org.iq80.leveldb.impl.TableCache$1.load(TableCache.java:54)
        at
com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3579)
        at
com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2372)
        at
com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2335)
        at
com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2250)
        at com.google.common.cache.LocalCache.get(LocalCache.java:3980)
        at
com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3984)
        at
com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4868)
        at org.iq80.leveldb.impl.TableCache.getTable(TableCache.java:80)

(I've just reproduced it by taking down master, putting back master and
killing the master that got elected)

I feel like this might happen because we don't stop properly the master,
the log for the shutdown is this one:

./bin/activemq stop
INFO: Loading '/root/.activemqrc'
INFO: Using java '/opt/molsfw/java/latest7/bin/java'
INFO: Waiting at least 30 seconds for regular process termination of pid
'98937' :
Java Runtime: Oracle Corporation 1.7.0_17 /opt/molsfw/java/jdk7u17/jre
  Heap sizes: current=1006848k  free=1001594k  max=1006848k
    JVM args: -Xms1G -Xmx1G
-Djava.util.logging.config.file=logging.properties -Dhawtio.realm=activemq
-Dhawtio.role=admins
-Dhawtio.rolePrincipalClasses=org.apache.activemq.jaas.GroupPrincipal
-Djava.security.auth.login.config=/root/apache-activemq-5.9.0/conf/login.config
-Dactivemq.classpath=/root/apache-activemq-5.9.0/conf;
-Dactivemq.home=/root/apache-activemq-5.9.0
-Dactivemq.base=/root/apache-activemq-5.9.0
-Dactivemq.conf=/root/apache-activemq-5.9.0/conf
-Dactivemq.data=/root/apache-activemq-5.9.0/data
Extensions classpath:

[/root/apache-activemq-5.9.0/lib,/root/apache-activemq-5.9.0/lib/camel,/root/apache-activemq-5.9.0/lib/optional,/root/apache-activemq-5.9.0/lib/web,/root/apache-activemq-5.9.0/lib/extra]
ACTIVEMQ_HOME: /root/apache-activemq-5.9.0
ACTIVEMQ_BASE: /root/apache-activemq-5.9.0
ACTIVEMQ_CONF: /root/apache-activemq-5.9.0/conf
ACTIVEMQ_DATA: /root/apache-activemq-5.9.0/data
Connecting to pid: 98937
Stopping broker: localhost
.............................
INFO: Regular shutdown not successful,  sending SIGKILL to process with pid
'98937'

So clearly some files might not be flushed/closed properly.

How can we fix this?

Even if fixed from a command line perspective, it will still worry me
because if the JVM goes down for any (other) reason the filesystem seems
corrupted and I really don't want to have somebody to go there and clean
the mess.

Funnily enough on AMQ instance #3, with exactly the same configuration (but
in slave mode) the stop command works fine.

./bin/activemq stop
INFO: Using default configuration
(you can configure options in one of these file: /etc/default/activemq
/root/.activemqrc)

INFO: Invoke the following command to create a configuration file
./bin/activemq setup [ /etc/default/activemq | /root/.activemqrc ]

INFO: Using java '/opt/molsfw/java/latest7/bin/java'
INFO: Waiting at least 30 seconds for regular process termination of pid
'53586' :
Java Runtime: Oracle Corporation 1.7.0_17 /opt/molsfw/java/jdk7u17/jre
  Heap sizes: current=1006848k  free=1001594k  max=1006848k
    JVM args: -Xms1G -Xmx1G
-Djava.util.logging.config.file=logging.properties -Dhawtio.realm=activemq
-Dhawtio.role=admins
-Dhawtio.rolePrincipalClasses=org.apache.activemq.jaas.GroupPrincipal
-Djava.security.auth.login.config=/root/apache-activemq-5.9.0/conf/login.config
-Dactivemq.classpath=/root/apache-activemq-5.9.0/conf;
-Dactivemq.home=/root/apache-activemq-5.9.0
-Dactivemq.base=/root/apache-activemq-5.9.0
-Dactivemq.conf=/root/apache-activemq-5.9.0/conf
-Dactivemq.data=/root/apache-activemq-5.9.0/data
Extensions classpath:

[/root/apache-activemq-5.9.0/lib,/root/apache-activemq-5.9.0/lib/camel,/root/apache-activemq-5.9.0/lib/optional,/root/apache-activemq-5.9.0/lib/web,/root/apache-activemq-5.9.0/lib/extra]
ACTIVEMQ_HOME: /root/apache-activemq-5.9.0
ACTIVEMQ_BASE: /root/apache-activemq-5.9.0
ACTIVEMQ_CONF: /root/apache-activemq-5.9.0/conf
ACTIVEMQ_DATA: /root/apache-activemq-5.9.0/data
Connecting to pid: 53586
INFO: failed to resolve jmxUrl for pid:53586, using default JMX url
Connecting to JMX URL: service:jmx:rmi:///jndi/rmi://localhost:1099/jmxrmi
.Stopping broker: localhost
. FINISHED

Another frequent error we get when trying to stop ActiveMQ is:

ACTIVEMQ_OPTS_MEMORY="-Xms3G -Xmx3G" ./bin/activemq stop
INFO: Loading '/root/.activemqrc'
INFO: Using java '/opt/molsfw/java/latest7/bin/java'
INFO: Waiting at least 30 seconds for regular process termination of pid
'55182' :
Error occurred during initialization of VM
Could not reserve enough space for object heap
.............................
INFO: Regular shutdown not successful,  sending SIGKILL to process with pid
'55182'

3GB heap seem pretty big to me, so I doubt that is the real problem.

Just to give as much as possible infos, the AMQ nodes are running on:

SmartOS 5.11  release (REL)
joyent_20130529T165900Z  version (VER)

java version "1.7.0_17"
Java(TM) SE Runtime Environment (build 1.7.0_17-b02)

And while testing the failover/resilience we are sending messages to this
connection string: "failover://(tcp://10.251.76.45:61616,tcp://
10.251.76.58:61616,tcp://10.251.76.60:61616) "

I hope that the problem is clear, but I'll reiterate: we have 3 machines,
10.251.76.45 (#1), 10.251.76.58 (#2) and 10.251.76.60 (#3).
We have a fully working Zookeper cluster 10.251.76.39:2181,10.251.76.40:2181
,10.251.76.52:2181 and we want to have high avaibility by leveraging the
latest version of AMQ & LevelDB persistence.

What is the problem with this
https://gist.github.com/aterreno/7229464configuration?

Thanks a lot,

toni

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message