activemq-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Bain <tb...@alumni.duke.edu>
Subject Re: Need help investigating committed messages resurfacing after restart.
Date Thu, 26 Oct 2017 05:59:44 GMT
I've never seen anything like this on the mailing list, and the one project
where I've been directly involved in managing brokers used non-persistent
messaging exclusively so I've never seen it personally.

Here are a few crazy ideas of what might possibly cause the behavior you
described:

   - As you said, maybe a data file containing acks for earlier (old)
   messages was deleted.
      - Maybe a bug in the ack compaction feature that was added recently
      to prevent a few old messages from keeping alive chains of data files?
      - Maybe something within the environment and completely unrelated to
      ActiveMQ (a cron job that archives files that haven't been touched in a
      certain amount of time, for example)?
   - Maybe the acks were never actually put into KahaDB to begin with, and
   the client was working off of state that existed only in memory.
      - How did you determine that all messages on the queue were consumed
      before the restart? I'd recommend you confirm that EnqueueCount and
      DequeueCount match for the queue when viewed via a JMX viewer such as
      JConsole.
      - Maybe the consumption of the messages was within a transaction that
   was rolled back for some reason after some long period of time? I don't
   think transactions actually allow that, but I don't know them well enough
   to be sure.

All of those seem pretty crazy, and the least-crazy one does sound like the
deletion of a data file, but I'd try to rule the other two out if you can.

Assuming it really is a problem of KahaDB losing the log files, I'd suggest
getting the customer to do the following:

   1. Archive all KahaDB data files to ensure that you've got the full set
   to work with.
   2. Enable the TRACE-level logging as described in
   http://activemq.apache.org/why-do-kahadb-log-files-remain-after-cleanup.htm
   .
   3. When a problem occurs, pull the full set of data files to your
   production environment. Walk backwards through the TRACE-level logs,
   recreating the set of data files that existed after each checkpoint
   operation and restarting the broker. You're looking for the point where
   upon restart of the broker, the "resurrected" messages aren't considered to
   be live anymore. Then you know which deletion was the one that caused it,
   and you can scan the surrounding logs to see if there's any indication of
   what might have gone wrong.

It's not a very strong debugging plan, but I'm struggling to think of
another way to troubleshoot this, so maybe it'll be useful to you.

Tim

On Mon, Oct 23, 2017 at 6:00 AM, hakanj <hakan.johansson@jeppesen.com>
wrote:

> We have a very strange problem with our ActiveMQ broker that happens when
> we
> restart it after
> running it for several months. We have not been able to reproduce it
> in-house. It only happens in
> production at our customer's site. At this time we have seen this issue at
> least three times.
>
> What happens is that some very old (~1 month) messages come back from the
> dead when we restart the
> broker. The last time this happened we saw ~1600 resurrected messages. We
> can see in our logs that
> these messages have already been processed. There have been more messages
> written to the same
> queue, but these did not come back to life.
>
> At the time of broker shutdown the problematic queue was empty with no
> consumers or producers
> connected (they were stopped before the broker was stopped).
>
> We tried deleting the kahadb directory when this happened the last time,
> but
> that didn't help.
>
> The log does not contain any errors during startup. We get a couple of rows
> similar to
> "Recovery replayed 170629 operations" in the log, but this is normal as far
> as I know. If I am
> wrong about this being normal, please let me know.
>
> ---
>
> We have a mix of applications that communicate using an ActiveMQ broker.
> The Java applications, like Wildfly, use the OpenWire protocol.
> The C++ applications use the AMQP protocol, with
> "transport.transformer=jms"
> setting in
> "activemq.xml".
>
> The C++ applications use transactional sessions to consume messages from
> the
> broker. The
> problematic queues are consumed by one of these C++ applications. The
> messages are produced by one
> of the Java/Wildfly applications.
>
> Most, but not all, messages that are sent through the broker are
> non-persistent. The messages on
> the problem queues are persistent.
>
> We use queues for everything. Topics are not used.
>
> There is only a single ActiveMQ broker instance.
>
> ActiveMQ version: 5.14.5.
>
> ---
>
> We have tried to trigger this error in-house, but without any success. We
> have only seen it in our
> customer's production environment.
>
> Our own hypothesis is that the kahadb data file that contains the acks is
> deleted, but not the file
> containing the actual messages. After restart it then looks like the
> messages were never sent. Even
> if this is true we have no idea how it can happen. All our manual tests
> show
> that ActiveMQ does the
> right thing. There must be some special case that we cannot find.
>
> Does anyone have an idea what could be the cause of this issue?
> Has anyone seen anything like this before?
> Any ideas on how get a reproducible case?
>
>
>
>
> --
> Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-
> f2341805.html
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message