activemq-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christian Posta <christian.po...@gmail.com>
Subject Re: ActiveMQ crashes frequently
Date Mon, 03 Jun 2013 03:10:47 GMT
You should checkout the failover transport to handle reconnecting.

On Sunday, June 2, 2013, fenbers wrote:

>
>
>
>
>
>     I don't know how to determine the NFS version but we are running on
>     RHEL 5.5.
>
>     I have not checked the syslog.&nbsp; Thanks for the tip.&nbsp; I will
> do that
>     after our morning Operations.
>
>     We are also very inclined to believe this is an NFS issue, based on
>     behaviors network-wide which have nothing to do with ActiveMQ, e.g,
>     often taking 10 seconds to list just 5 files in an NFS-mounted
>     directory.
>
>     So, we are creating an action plan this weekend to eliminate as many
>     NFS mount points as possible, and seeing how that helps the
>     situation.&nbsp; The plan needs approval/buy-in from key people to be
>     implemented, so it may be a couple of weeks to implement the
> plan.&nbsp;
>     In the meantime, ActiveMQ either shuts itself down or behaves in
>     rather despondent ways, so we find we are having to restart ActiveMQ
>     every 3 or 4 hours (and this frequency is slowly increasing).
>
>     Once ActiveMQ is rebooted, we find that our producers and our
>     consumers have to be shut down and relaunched in order to
>     reestablish the connection with ActiveMQ.&nbsp; This is a royal
> pain!&nbsp;
>     However, a producer will throw an exception whenever it tries to
>     send a message through a lost connection, and so I catch the
>     exception where I close the connection and reopen it.&nbsp; Thus, my
>     producers are able to reconnect automatically in the event ActiveMQ
>     is restarted.
>
>     But with the consumers, no exception is thrown as it waits for
>     notifications.&nbsp; It simply waits for a notification that never
>     happens after the connection with ActiveMQ is lost.&nbsp; So what is
> your
>     recommended method for a consumer to check for a disconnection??&nbsp;
>     (Maybe I should post his question as a separate thread...)
>
>     Mark
>
>
>     On 5/29/2013 3:21 AM, rajdavies [via
>       ActiveMQ] wrote:
>
>      Ultimately I'm pretty confident this problem is an
>       NFS problem &nbsp;- and as Johan has already let the cat out of the
> bag
>       ;) - let me ask the following:
>
>
>       &nbsp;Which version of NFS 4 are you using and which environment?
>
>       &nbsp;Have you checked the system logs for NFS errors on all the
>       machines running ActiveMQ brokers ?
>
>
>       thanks,
>
>
>       Rob
>
>
>       On 29 May 2013, at 00:46, Christian Posta &lt; [hidden email] &gt;
>       wrote:
>
>
>         &gt; I can make two recommendations.
>
>         &gt;
>         &gt; #1, being the preferred, create a test case that shows
>         this... that will
>
>         &gt; give us the best chance of finding out what's going on...
>         take a look at
>
>         &gt; the following test cases in the activemq source code to
>         give you an idea
>
>         &gt; about how to go about doing it...
>
>         &gt;
>         &gt;
> http://svn.apache.org/viewvc/activemq/trunk/activemq-unit-tests/src/test/java/org/apache/activemq/usecases/
>         &gt;
>         &gt;
> http://svn.apache.org/viewvc/activemq/trunk/activemq-unit-tests/src/test/java/org/apache/activemq/bugs/
>         &gt;
>         &gt;
> http://svn.apache.org/viewvc/activemq/trunk/activemq-unit-tests/src/test/java/org/apache/activemq/test/JmsTopicSendReceiveTest.java?view=markup
>         &gt;
>         &gt;
>         &gt; #2, if creating a test case doesn't sound like something
>         you want to get
>
>         &gt; into.. i guess, give us the exact configs of broker,
>         clients, number of
>
>         &gt; consumers, number of topics, message sizes, etc, etc all
>         details and if one
>
>         &gt; of us gets the urge we can try it out on our boxes. this
>         will not be nearly
>
>         &gt; as good as #1, and will provide a higher barrier to entry
>         because we spend
>
>         &gt; our spare time doing this and like to spend that time
>         debugging and fixing,
>
>         &gt; and not setting up environments and usecases which may not
>         even show a bug
>
>         &gt; :)
>
>         &gt;
>         &gt;
>         &gt;
>         &gt;
>         &gt; On Tue, May 28, 2013 at 4:34 PM, fenbers &lt; [hidden email]
> &gt;
>         wrote:
>
>         &gt;
>         &gt;&gt;
>         &gt;&gt;
>         &gt;&gt;
>         &gt;&gt;
>         &gt;&gt;
>         &gt;&gt; &nbsp; &nbsp;I'm getting the Sync exception on both,
> local and
>         NFS.&amp;nbsp;
>
>         &gt;&gt; Originally,
>
>         &gt;&gt; &nbsp; &nbsp;I was only using a local disk, but there
> wasn't much
>         disk space for
>
>         &gt;&gt; &nbsp; &nbsp;the ever growing list of 33MB enumerated
> .log files
>         that weren't
>
>         &gt;&gt; &nbsp; &nbsp;cleaned up.&amp;nbsp; So I reconfigured
> ActiveMQ to
>         put these db files on
>
>         &gt;&gt; an
>
>         &gt;&gt; &nbsp; &nbsp;NFS mount.&amp;nbsp; But the sync exceptions
>         occurred either way.
>
>         &gt;&gt;
>         &gt;&gt; &nbsp; &nbsp;I've changed *all* my consumers to
> AUTO_ACKNOWLEDGE,
>         thinking that
>
>         &gt;&gt; &nbsp; &nbsp;maybe an ACKNOWLEDGEment leak was causing
the
>         undeleted files.&amp;nbsp;
>
>         &gt;&gt; That
>
>         &gt;&gt; &nbsp; &nbsp;didn't help...&amp;nbsp; The TRACE
level
> logging
>         points to only two of my 5
>
>         &gt;&gt; &nbsp; &nbsp;topics that accumulate these undeleted
db
>         files.&amp;nbsp; So I've
>
>         &gt;&gt; &nbsp; &nbsp;concentrated by scrutiny over consumers
of
> these two
>         topics.&amp;nbsp; But
>
>         &gt;&gt; &nbsp; &nbsp;have not found anything out of the
>         ordinary.&amp;nbsp;
>
>         &gt;&gt;
>         &gt;&gt; &nbsp; &nbsp;What is puzzling me still, is that the
> frequency of
>         the log file
>
>         &gt;&gt; &nbsp; &nbsp;build-up and the frequency of exceptions
> continues
>         to increase even
>
>         &gt;&gt; &nbsp; &nbsp;though the amount of messages sent per
day
> by the
>         producers remains
>
>         &gt;&gt; &nbsp; &nbsp;nearly constant...
>
>         &gt;&gt; &nbsp; &nbsp;Mark
>
>         &gt;&gt;
>         &gt;&gt; &nbsp; &nbsp;On 5/28/2013 6:06 PM, ceposta [via
>
>         &gt;&gt; &nbsp; &nbsp; &nbsp;ActiveMQ] wrote:
>
>         &gt;&gt;
>         &gt;&gt; &nbsp; &nbsp; Sounds like there's multiple issues...
>
>         &gt;&gt;
>         &gt;&gt; &nbsp; &nbsp; &nbsp;You're journal files aren't
being
> cleaned up, AND
>         you're getting
>
>         &gt;&gt; &nbsp; &nbsp; &nbsp;the Sync
>
>         &gt;&gt;
>         &gt;&gt; &nbsp; &nbsp; &nbsp;exception?
>
>         &gt;&gt;
>         &gt;&gt; &nbsp; &nbsp; &nbsp;You get the sync exception on
local
> disk mount? Or
>         just NFS?
>
>         &gt;&gt;
>         &gt;&gt;
>         &gt;&gt; &nbsp; &nbsp; &nbsp;If the journals aren't being
cleaned
> up, are your
>         consumers
>
>         &gt;&gt; &nbsp; &nbsp; &nbsp;properly
>
>         &gt;&gt;
>         &gt;&gt; &nbsp; &nbsp; &nbsp;ack'ing messages?
>
>         &gt;&gt;
>         &gt;&gt;
>         &gt;&gt;
>         &gt;&gt; &nbsp; &nbsp; &nbsp;On Tue, May 28, 2013 at 2:42
PM,
> fenbers &amp;lt;
>         [hidden email] &amp;gt;
>
>         &gt;&gt; &nbsp; &nbsp; &nbsp;wrote:
>
>         &gt;&gt;
>         &gt;&gt;
>         &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt;
>
>         &gt;&gt;
>         &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt;
>
>         &gt;&gt;
>         &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt;
>
>         &gt;&gt;
>         &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt;
>
>         &gt;&gt;
>         &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt;
>
>         &gt;&gt;
>         &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt; &amp;nbsp;
&amp;nbsp;
> I would LOVE to
>         help you help me!&amp;amp;nbsp; But
>
>         &gt;&gt; I have
>
>         &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;no idea how to go
>
>         &gt;&gt;
>         &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt; &amp;nbsp;
&amp;nbsp;
> about making a
>         test case.&amp;amp;nbsp; If you
>
>         &gt;&gt; could drop
>
>         &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;some hints in this
>
>         &gt;&gt;
>         &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt; &amp;nbsp;
&amp;nbsp;
> regard, I might
>         be able to produce one.
>
>         &gt;&gt;
>         &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt;
>
>         &gt;&gt;
>         &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt; &amp;nbsp;
&amp;nbsp;
> My ActiveMQ
>         issues seem to be related to network
>
>         &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;slowness, which we
>
>         &gt;&gt;
>         &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt; &amp;nbsp;
&amp;nbsp;
> are diagnosing
>         separately.&amp;amp;nbsp; Or maybe
>
>         &gt;&gt; it is the
>
>         &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;other way around,
>
>         &gt;&gt;
>         &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt; &amp;nbsp;
&amp;nbsp;
> where ActiveMQ
>         problems are causing network
>
>         &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;sluggishness.&amp;amp;nbsp;
> Either
>
>         &gt;&gt;
>         &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt; &amp;nbsp;
&amp;nbsp;
> way, there seems
>         to be a correlation, except
>
>         &gt;&gt; that when
>
>         &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;network
>
>         &gt;&gt;
>         &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt; &amp;nbsp;
&amp;nbsp;
> responsiveness
>         improves, ActiveMQ does not.
>
>         &gt;&gt;
>         &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt;
>
>         &gt;&gt;
>         &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt; &amp;nbsp;
&amp;nbsp;
> The problem I'm
>         having with AMQ is progressive,
>
>         &gt;&gt; which
>
>         &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;is even more
>
>         &gt;&gt;
>         &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt; &amp;nbsp;
&amp;nbsp;
> puzzling, because
>         we are not adding to the
>
>         &gt;&gt; number of
>
>         &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;messages that
>
>         &gt;&gt;
>         &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt; &amp;nbsp;
&amp;nbsp;
> AMQ has to
>         handle.&amp;amp;nbsp; Today, we were up
>
>         &gt;&gt; to 191
>
>         &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;undeleted db-NNN.log
>
>         &gt;&gt;
>         &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt; &amp;nbsp;
&amp;nbsp;
> files in the
>         database directory before I
>
>         &gt;&gt; stopped AMQ
>
>         &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;and deleted
>
>         &gt;&gt;
>         &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt; &amp;nbsp;
&amp;nbsp;
>         them.&amp;amp;nbsp;&amp;amp;nbsp; NNN was up to 451, so
>
>         &gt;&gt; 260
>
>         &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;files had been cleaned
up
>
>         &gt;&gt;
>         &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt; by AMQ's
>
>         &gt;&gt;
>         &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt; &amp;nbsp;
&amp;nbsp;
> automatic
>         processes...
>
>         &gt;&gt;
>         &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt;
>
>         &gt;&gt;
>         &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt; &amp;nbsp;
&amp;nbsp;
> Will log files
>         assist you in helping
>
>         &gt;&gt; me?&amp;amp;nbsp; I
>
>         &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;have TRACE level
>
>         &gt;&gt;
>         &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt; &amp;nbsp;
&amp;nbsp;
> messages turned
>         on, so they are quite large.
>
>         &gt;&gt;
>         &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt;
>
>         &gt;&gt;
>
> <



-- 
*Christian Posta*
http://www.christianposta.com/blog
twitter: @christianposta

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message