Return-Path: X-Original-To: apmail-activemq-users-archive@www.apache.org Delivered-To: apmail-activemq-users-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AF14CC5F5 for ; Mon, 3 Jun 2013 03:11:16 +0000 (UTC) Received: (qmail 53334 invoked by uid 500); 3 Jun 2013 03:11:16 -0000 Delivered-To: apmail-activemq-users-archive@activemq.apache.org Received: (qmail 53309 invoked by uid 500); 3 Jun 2013 03:11:16 -0000 Mailing-List: contact users-help@activemq.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@activemq.apache.org Delivered-To: mailing list users@activemq.apache.org Received: (qmail 53299 invoked by uid 99); 3 Jun 2013 03:11:15 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Jun 2013 03:11:15 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of christian.posta@gmail.com designates 209.85.217.174 as permitted sender) Received: from [209.85.217.174] (HELO mail-lb0-f174.google.com) (209.85.217.174) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Jun 2013 03:11:08 +0000 Received: by mail-lb0-f174.google.com with SMTP id u10so3321283lbi.33 for ; Sun, 02 Jun 2013 20:10:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=8BxLPVzbvXdQeQF0lXVh5EJo2KwyG9uXkUuGzteKTKg=; b=DqzJRZeUVmABrjolcyBtYeD7Hjk0lCcI3Rc0gS//MeOX6UgTDx3e/tKeJ0Ccl7SuAt E801gwZZsr5xSQ0ftuKlL4Yx80B4ObLSsDMmwsWENx2UeMQJpAecG8lB3gEGplDQ/wsf QxE3ZeNsgALcj8APzFgS/+4NSzz+2+bZCcKOUxk2qGzgfzYp4kijX7c1z7ORnqHeEFFa 0l5vMsYAlC76NE/3N3UVGy72h3P24i1+1L2Ynsys5J3webTtLMPadBRz3sJhZRMfIR1C CNCujqNwaraioedOyuf/GLTYa3OpG6+SvEgWY6oIERKFcBRK357/a9A+wG+h103YzRt7 EIdg== MIME-Version: 1.0 X-Received: by 10.152.18.134 with SMTP id w6mr9878799lad.42.1370229048032; Sun, 02 Jun 2013 20:10:48 -0700 (PDT) Received: by 10.114.68.199 with HTTP; Sun, 2 Jun 2013 20:10:47 -0700 (PDT) In-Reply-To: <51AB3B98.6080204@noaa.gov> References: <1326851732286-4305407.post@n4.nabble.com> <1326970496160-4309718.post@n4.nabble.com> <1367336450496-4666469.post@n4.nabble.com> <51A524BE.7000205@noaa.gov> <51A53F24.1010005@noaa.gov> <469F3D8F-326A-4A11-958D-79DFB6973563@gmail.com> <51AB3B98.6080204@noaa.gov> Date: Sun, 2 Jun 2013 20:10:47 -0700 Message-ID: Subject: Re: ActiveMQ crashes frequently From: Christian Posta To: "users@activemq.apache.org" Content-Type: multipart/alternative; boundary=089e01493d90ab42e904de374fe3 X-Virus-Checked: Checked by ClamAV on apache.org --089e01493d90ab42e904de374fe3 Content-Type: text/plain; charset=ISO-8859-1 You should checkout the failover transport to handle reconnecting. On Sunday, June 2, 2013, fenbers wrote: > > > > > > I don't know how to determine the NFS version but we are running on > RHEL 5.5. > > I have not checked the syslog.  Thanks for the tip.  I will > do that > after our morning Operations. > > We are also very inclined to believe this is an NFS issue, based on > behaviors network-wide which have nothing to do with ActiveMQ, e.g, > often taking 10 seconds to list just 5 files in an NFS-mounted > directory. > > So, we are creating an action plan this weekend to eliminate as many > NFS mount points as possible, and seeing how that helps the > situation.  The plan needs approval/buy-in from key people to be > implemented, so it may be a couple of weeks to implement the > plan.  > In the meantime, ActiveMQ either shuts itself down or behaves in > rather despondent ways, so we find we are having to restart ActiveMQ > every 3 or 4 hours (and this frequency is slowly increasing). > > Once ActiveMQ is rebooted, we find that our producers and our > consumers have to be shut down and relaunched in order to > reestablish the connection with ActiveMQ.  This is a royal > pain!  > However, a producer will throw an exception whenever it tries to > send a message through a lost connection, and so I catch the > exception where I close the connection and reopen it.  Thus, my > producers are able to reconnect automatically in the event ActiveMQ > is restarted. > > But with the consumers, no exception is thrown as it waits for > notifications.  It simply waits for a notification that never > happens after the connection with ActiveMQ is lost.  So what is > your > recommended method for a consumer to check for a disconnection??  > (Maybe I should post his question as a separate thread...) > > Mark > > > On 5/29/2013 3:21 AM, rajdavies [via > ActiveMQ] wrote: > > Ultimately I'm pretty confident this problem is an > NFS problem  - and as Johan has already let the cat out of the > bag > ;) - let me ask the following: > > >  Which version of NFS 4 are you using and which environment? > >  Have you checked the system logs for NFS errors on all the > machines running ActiveMQ brokers ? > > > thanks, > > > Rob > > > On 29 May 2013, at 00:46, Christian Posta < [hidden email] > > wrote: > > > > I can make two recommendations. > > > > > #1, being the preferred, create a test case that shows > this... that will > > > give us the best chance of finding out what's going on... > take a look at > > > the following test cases in the activemq source code to > give you an idea > > > about how to go about doing it... > > > > > > http://svn.apache.org/viewvc/activemq/trunk/activemq-unit-tests/src/test/java/org/apache/activemq/usecases/ > > > > > http://svn.apache.org/viewvc/activemq/trunk/activemq-unit-tests/src/test/java/org/apache/activemq/bugs/ > > > > > http://svn.apache.org/viewvc/activemq/trunk/activemq-unit-tests/src/test/java/org/apache/activemq/test/JmsTopicSendReceiveTest.java?view=markup > > > > > > #2, if creating a test case doesn't sound like something > you want to get > > > into.. i guess, give us the exact configs of broker, > clients, number of > > > consumers, number of topics, message sizes, etc, etc all > details and if one > > > of us gets the urge we can try it out on our boxes. this > will not be nearly > > > as good as #1, and will provide a higher barrier to entry > because we spend > > > our spare time doing this and like to spend that time > debugging and fixing, > > > and not setting up environments and usecases which may not > even show a bug > > > :) > > > > > > > > > > > On Tue, May 28, 2013 at 4:34 PM, fenbers < [hidden email] > > > wrote: > > > > >> > >> > >> > >> > >> > >>    I'm getting the Sync exception on both, > local and > NFS.&nbsp; > > >> Originally, > > >>    I was only using a local disk, but there > wasn't much > disk space for > > >>    the ever growing list of 33MB enumerated > .log files > that weren't > > >>    cleaned up.&nbsp; So I reconfigured > ActiveMQ to > put these db files on > > >> an > > >>    NFS mount.&nbsp; But the sync exceptions > occurred either way. > > >> > >>    I've changed *all* my consumers to > AUTO_ACKNOWLEDGE, > thinking that > > >>    maybe an ACKNOWLEDGEment leak was causing the > undeleted files.&nbsp; > > >> That > > >>    didn't help...&nbsp; The TRACE level > logging > points to only two of my 5 > > >>    topics that accumulate these undeleted db > files.&nbsp; So I've > > >>    concentrated by scrutiny over consumers of > these two > topics.&nbsp; But > > >>    have not found anything out of the > ordinary.&nbsp; > > >> > >>    What is puzzling me still, is that the > frequency of > the log file > > >>    build-up and the frequency of exceptions > continues > to increase even > > >>    though the amount of messages sent per day > by the > producers remains > > >>    nearly constant... > > >>    Mark > > >> > >>    On 5/28/2013 6:06 PM, ceposta [via > > >>      ActiveMQ] wrote: > > >> > >>     Sounds like there's multiple issues... > > >> > >>      You're journal files aren't being > cleaned up, AND > you're getting > > >>      the Sync > > >> > >>      exception? > > >> > >>      You get the sync exception on local > disk mount? Or > just NFS? > > >> > >> > >>      If the journals aren't being cleaned > up, are your > consumers > > >>      properly > > >> > >>      ack'ing messages? > > >> > >> > >> > >>      On Tue, May 28, 2013 at 2:42 PM, > fenbers &lt; > [hidden email] &gt; > > >>      wrote: > > >> > >> > >>        &gt; > > >> > >>        &gt; > > >> > >>        &gt; > > >> > >>        &gt; > > >> > >>        &gt; > > >> > >>        &gt; &nbsp; &nbsp; > I would LOVE to > help you help me!&amp;nbsp; But > > >> I have > > >>        no idea how to go > > >> > >>        &gt; &nbsp; &nbsp; > about making a > test case.&amp;nbsp; If you > > >> could drop > > >>        some hints in this > > >> > >>        &gt; &nbsp; &nbsp; > regard, I might > be able to produce one. > > >> > >>        &gt; > > >> > >>        &gt; &nbsp; &nbsp; > My ActiveMQ > issues seem to be related to network > > >>        slowness, which we > > >> > >>        &gt; &nbsp; &nbsp; > are diagnosing > separately.&amp;nbsp; Or maybe > > >> it is the > > >>        other way around, > > >> > >>        &gt; &nbsp; &nbsp; > where ActiveMQ > problems are causing network > > >>        sluggishness.&amp;nbsp; > Either > > >> > >>        &gt; &nbsp; &nbsp; > way, there seems > to be a correlation, except > > >> that when > > >>        network > > >> > >>        &gt; &nbsp; &nbsp; > responsiveness > improves, ActiveMQ does not. > > >> > >>        &gt; > > >> > >>        &gt; &nbsp; &nbsp; > The problem I'm > having with AMQ is progressive, > > >> which > > >>        is even more > > >> > >>        &gt; &nbsp; &nbsp; > puzzling, because > we are not adding to the > > >> number of > > >>        messages that > > >> > >>        &gt; &nbsp; &nbsp; > AMQ has to > handle.&amp;nbsp; Today, we were up > > >> to 191 > > >>        undeleted db-NNN.log > > >> > >>        &gt; &nbsp; &nbsp; > files in the > database directory before I > > >> stopped AMQ > > >>        and deleted > > >> > >>        &gt; &nbsp; &nbsp; > them.&amp;nbsp;&amp;nbsp; NNN was up to 451, so > > >> 260 > > >>        files had been cleaned up > > >> > >>        &gt; by AMQ's > > >> > >>        &gt; &nbsp; &nbsp; > automatic > processes... > > >> > >>        &gt; > > >> > >>        &gt; &nbsp; &nbsp; > Will log files > assist you in helping > > >> me?&amp;nbsp; I > > >>        have TRACE level > > >> > >>        &gt; &nbsp; &nbsp; > messages turned > on, so they are quite large. > > >> > >>        &gt; > > >> > > < -- *Christian Posta* http://www.christianposta.com/blog twitter: @christianposta --089e01493d90ab42e904de374fe3--