activemq-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hiram Chirino <hi...@hiramchirino.com>
Subject Re: Replicated LevelDB Store not working
Date Wed, 30 Oct 2013 15:44:57 GMT
On Wed, Oct 30, 2013 at 6:12 AM, Antonio Terreno
<antonio.terreno@gmail.com> wrote:
> Hi all,
> we are having some troubles setting up a cluster of three nodes of ActiveMQ
> v.5.9.0 with LevelDB and Zookeeper, as described in this page:
> http://activemq.apache.org/replicated-leveldb-store.html.
>
> The configuration of the 3 servers is in this gist:
> https://gist.github.com/aterreno/7229464. (the only difference between the
> 3 instances is the hostname, we use the IP of the host)
>

That looks good.

> First of all, the performance compared with a local Kaha DB store (jms
> failover url with no replica) seem less performant,

Yeah. we really have not done much benchmarking to compare the 2 yet.
But it probably due to extra work need to replicate the data to the
slaves.  Are you hitting any CPU/Network/Disk bottlenecks?

> but more importantly,
> whenever we try to take down the master, with the command bin/activemq stop
> the slaves that tries to become master gets IOExpections from levelDB.

That's not expected.  Your using the pure Java leveldb driver which
might still have some bugs in it.
It might be interesting to see if this failure is just isolated to it.
 If you get a chance you could
you copy into your distribution's 'lib' directory the following jar:
    http://repo1.maven.org/maven2/org/fusesource/leveldbjni/leveldbjni-all/1.8/leveldbjni-all-1.8.jar

That should get you to use the JNI implementation.  You'll know it's
being used when you see the following message logged:
    INFO | Using the JNI LevelDB implementation.

> (I've just reproduced it by taking down master, putting back master and
> killing the master that got elected)
>
> I feel like this might happen because we don't stop properly the master,
> the log for the shutdown is this one:
>
> ./bin/activemq stop
<snip>
> INFO: failed to resolve jmxUrl for pid:53586, using default JMX url
> Connecting to JMX URL: service:jmx:rmi:///jndi/rmi://localhost:1099/jmxrmi
> .Stopping broker: localhost
> . FINISHED
>
> Another frequent error we get when trying to stop ActiveMQ is:
>
> ACTIVEMQ_OPTS_MEMORY="-Xms3G -Xmx3G" ./bin/activemq stop
> INFO: Loading '/root/.activemqrc'
> INFO: Using java '/opt/molsfw/java/latest7/bin/java'
> INFO: Waiting at least 30 seconds for regular process termination of pid
> '55182' :
> Error occurred during initialization of VM
> Could not reserve enough space for object heap
> .............................
> INFO: Regular shutdown not successful,  sending SIGKILL to process with pid
> '55182'

That seems really weird since we are just doing a JMX remote call to
stop the running server.


> And while testing the failover/resilience we are sending messages to this
> connection string: "failover://(tcp://10.251.76.45:61616,tcp://
> 10.251.76.58:61616,tcp://10.251.76.60:61616) "
>
> I hope that the problem is clear, but I'll reiterate: we have 3 machines,
> 10.251.76.45 (#1), 10.251.76.58 (#2) and 10.251.76.60 (#3).
> We have a fully working Zookeper cluster 10.251.76.39:2181,10.251.76.40:2181
> ,10.251.76.52:2181 and we want to have high avaibility by leveraging the
> latest version of AMQ & LevelDB persistence.
>
> What is the problem with this
> https://gist.github.com/aterreno/7229464configuration?


Yep. seem simple enough.  I don't see anything wrong with the
configuration.  Just seems like there might be some bugs in the
implementation still.  Thanks for reporting it.  Hopefully we can get
to the bottom of it soon.

>
> Thanks a lot,
>
> toni



-- 
Hiram Chirino

Engineering | Red Hat, Inc.

hchirino@redhat.com | fusesource.com | redhat.com

skype: hiramchirino | twitter: @hiramchirino

blog: Hiram Chirino's Bit Mojo

Mime
View raw message