camel-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Quinn Stevenson <qu...@pronoia-solutions.com>
Subject Re: Transaction problem with Camel, ActiveMQ and Spring JMS
Date Mon, 08 Feb 2016 15:55:49 GMT
Hi Stephan - 

I have a keen interest in this one because my customers rely very heavily on NO MESSAGE LOSS
- period (they are mostly Healthcare Providers).

I was comparing configurations again, and I came up with one more difference - can you try
setting transacted=false in the “standard” test?  I ran it that was about 6-7 times and
didn’t ever lose a message between the queues.  I’m using ActiveMQ 5.11.4 and Camel 2.16.2
for this test.

Also, in the real world, would you be doing this queue to queue work using one or two ActiveMQ
brokers?  If you’d only be using one, you may want to try camel-sims - I’ve had pretty
good luck with it, but it doesn’t support XA.  

If you’d be using XA in the real world, we should test that scenario as well.  It feels
like the “standard” configuration we’re using is almost and XA config, but not quite.
 What you’ve call the noTxManager config is using internal JMS transactions and it’s working
- that’s closer to the camel-jms configuration I’d use for a single broker.


> On Feb 6, 2016, at 2:32 PM, Stephan Burkard <sburkard@gmail.com> wrote:
> 
> Hi Quinn
> 
> I don't think that you need to match exactly my broker version. I had first
> discovered this issue on ActiveMQ 5.9.0 standard edition. I guess that
> simply every broker version suffers from this. I really don't think it is
> an ActiveMQ problem. It is according to Redhat a Spring JMS problem.
> 
> No, I never tried to use an embedded broker. Probably because I used remote
> brokers when I discovered the problem during Master-Slave failover tests. I
> will try to rewrite the test project to use an embedded broker that can be
> stopped and started as part of the test.
> 
> Yes, that's what I meant. That the remote broker increases the probability
> to show the issue. Because when the analysis of Redhat was correct, it is
> really a timing issue. You can also increase the chance for the issue if
> you produce even more messages per second. That increases the probability
> that a message falls just into the problematic time slice where the
> consumer has committed but not the producer.
> 
> Yes, that's right. I start the test and when I see lots of console output I
> hit enter on the console where the stop command of the broker has waited.
> Then I wait about 5 to 10 seconds and then I start the broker again. The
> test reconnects and continues.
> 
> Regards
> Stephan
> 
> 
> 
> 
> 
> On Fri, Feb 5, 2016 at 7:40 PM, Quinn Stevenson <quinn@pronoia-solutions.com <mailto:quinn@pronoia-solutions.com>
>> wrote:
> 
>> Stephan -
>> 
>> I’ll get a broker running and try to match your version - I think I can
>> get it from one of my customers whose running Fuse 6.2.
>> 
>> While I do that - have you considered trying to reproduce this using an
>> embedded broker that the test could control?  It would make it much easier
>> to reproduce.
>> 
>> I don’t think running the broker locally vs remotely should increase any
>> probably of losing messages - we shouldn’t lose any as long as the
>> configuration is correct.  It may increase the probably of an issue, but we
>> shouldn’t lose messages.
>> 
>> Also, just to confirm - when you’re testing this you are stopping/starting
>> the broker in the middle of the test, not killing and restarting the broker
>> - correct?
>> 
>> 
>>> On Feb 5, 2016, at 12:37 AM, Stephan Burkard <sburkard@gmail.com> wrote:
>>> 
>>> Hi Quinn
>>> 
>>> I just tested the POM changes you posted and the second run failed
>> (without
>>> failover-URL). I then tested with the failover-URL and the third attempt
>>> failed.
>>> 
>>> The latter is no big surprise since I discovered the problem during
>>> failover tests in a master-slave-config. I then reduced the setup to a
>>> single broker environment and it was still there.
>>> 
>>> My test broker is apache-activemq-5.11.0.redhat-620133, a patched Redhat
>>> version of AMQ 5.11. As you, I also don't change the AMQ version number
>> in
>>> the POM, I just use a newer broker than the library version. My broker
>> runs
>>> on another machine than the test. Perhaps this increases the probability
>> of
>>> losing a message?
>>> 
>>> Regards
>>> Stephan
>>> 
>>> 
>>> 
>>> 
>>> On Thu, Feb 4, 2016 at 7:06 PM, Quinn Stevenson <
>> quinn@pronoia-solutions.com <mailto:quinn@pronoia-solutions.com> <mailto:quinn@pronoia-solutions.com
<mailto:quinn@pronoia-solutions.com>>
>>>> wrote:
>>> 
>>>> I tested this with a 5.9.0 broker and I am seeing messages dropped with
>>>> the TxText, but I still have to use the failover URL or the test just
>> stops
>>>> after the broker is restarted.
>>>> 
>>>> I don’t have a 5.9.1 broker to test with, so I don’t know if that would
>>>> help, but the next oldest broker I have is 5.10.1, and it seems to be
>>>> working with that broker.
>>>> 
>>>> NOTE:  I’m not changing the activemq-version in the POM when I change
>> the
>>>> broker version - I’m just starting a different broker (locally) on the
>> same
>>>> port.
>>>> 
>>>> 
>>>>> On Feb 4, 2016, at 10:41 AM, Quinn Stevenson <
>>>> quinn@pronoia-solutions.com <mailto:quinn@pronoia-solutions.com>>
wrote:
>>>>> 
>>>>> I still can’t make either test drop messages between the input and
the
>>>> output queue with the POM changes I sent, but I did find one difference
>>>> between what you’ve done and what I normally do that changes the output
>> I’m
>>>> seeing - I always use a failover URL
>>>>> 
>>>>> <property name="brokerURL"
>>>> 
>> value="failover:(tcp://localhost:61616?wireFormat.tightEncodingEnabled=false <tcp://localhost:61616?wireFormat.tightEncodingEnabled=false>
>>>> <tcp://localhost:61616?wireFormat.tightEncodingEnabled=false <tcp://localhost:61616?wireFormat.tightEncodingEnabled=false>
>> <tcp://localhost:61616?wireFormat.tightEncodingEnabled=false <tcp://localhost:61616?wireFormat.tightEncodingEnabled=false>>>)"/>
>>>>> 
>>>>> My test broker is v 5.10.1 as well - I’ll see if it makes any
>> difference
>>>> with 5.9.0
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Feb 4, 2016, at 9:52 AM, Quinn Stevenson <
>>>> quinn@pronoia-solutions.com <mailto:quinn@pronoia-solutions.com> <mailto:quinn@pronoia-solutions.com
<mailto:quinn@pronoia-solutions.com>>
>> <mailto:quinn@pronoia-solutions.com <mailto:quinn@pronoia-solutions.com>
<mailto:quinn@pronoia-solutions.com <mailto:quinn@pronoia-solutions.com>>>>
>> wrote:
>>>>>> 
>>>>>> It is strange - I’m trying to compare what you have in the “standard”
>>>> version to what I did before.  We tested our configs pretty heavily
>> under
>>>> all sorts of strange conditions to verify we weren’t looking messages,
>> but
>>>> we were using newer versions of Camel and ActiveMQ.
>>>>>> 
>>>>>> So we’re on the same page - can you try your tests again with POM
>>>> dependencies that look something like this?
>>>>>> 
>>>>>> <properties>
>>>>>>   <camel-version>2.12.5</camel-version>
>>>>>>   <activemq-version>5.9.0</activemq-version>
>>>>>> </properties>
>>>>>> 
>>>>>> <dependencies>
>>>>>>   <dependency>
>>>>>>       <groupId>org.apache.activemq</groupId>
>>>>>>       <artifactId>activemq-all</artifactId>
>>>>>>       <version>${activemq-version}</version>
>>>>>>   </dependency>
>>>>>>   <dependency>
>>>>>>       <groupId>org.apache.activemq</groupId>
>>>>>>       <artifactId>activemq-pool</artifactId>
>>>>>>       <version>${activemq-version}</version>
>>>>>>   </dependency>
>>>>>> 
>>>>>>   <dependency>
>>>>>>       <groupId>org.apache.camel</groupId>
>>>>>>       <artifactId>camel-spring</artifactId>
>>>>>>       <version>${camel-version}</version>
>>>>>>   </dependency>
>>>>>>   <dependency>
>>>>>>       <groupId>org.apache.camel</groupId>
>>>>>>       <artifactId>camel-jms</artifactId>
>>>>>>       <version>${camel-version}</version>
>>>>>>   </dependency>
>>>>>> 
>>>>>>   <dependency>
>>>>>>       <groupId>org.apache.camel</groupId>
>>>>>>       <artifactId>camel-test-spring</artifactId>
>>>>>>       <version>${camel-version}</version>
>>>>>>       <scope>test</scope>
>>>>>>   </dependency>
>>>>>> 
>>>>>>   <dependency>
>>>>>>       <groupId>commons-collections</groupId>
>>>>>>       <artifactId>commons-collections</artifactId>
>>>>>>       <version>3.2.1</version>
>>>>>>       <scope>test</scope>
>>>>>>   </dependency>
>>>>>>   <dependency>
>>>>>>       <groupId>org.hamcrest</groupId>
>>>>>>       <artifactId>hamcrest-integration</artifactId>
>>>>>>       <version>1.3</version>
>>>>>>       <scope>test</scope>
>>>>>>   </dependency>
>>>>>> 
>>>>>> </dependencies>
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Feb 4, 2016, at 9:49 AM, Stephan Burkard <sburkard@gmail.com
<mailto:sburkard@gmail.com>
>> <mailto:sburkard@gmail.com <mailto:sburkard@gmail.com>>
>>>> <mailto:sburkard@gmail.com <mailto:sburkard@gmail.com> <mailto:sburkard@gmail.com
<mailto:sburkard@gmail.com>>>> wrote:
>>>>>>> 
>>>>>>> Hi Quinn
>>>>>>> 
>>>>>>> The "standard" version is the big mystery. As I stated in my
first
>>>> post, a
>>>>>>> Redhat engineer analysed a similar project (with less book-keeping
>> and
>>>>>>> logging stuff) and his conclusion was that as soon as a transaction
>>>> manager
>>>>>>> is explicitly defined, Spring JMS Template (that is used by Camel
>>>> under the
>>>>>>> hood) creates two of them by bug, by accident or just by strange
>>>> behaviour.
>>>>>>> 
>>>>>>> This conclusion was quite suprising since that meant that all
our
>>>> Camel-JMS
>>>>>>> project are theoretically suffering from message loss.
>>>>>>> 
>>>>>>> The "no-tx" version should definitely be OK, see also CAMEL-5055
for
>>>> the "
>>>>>>> lazyCreateTransactionManager" flag. The JMS transaction manager
may
>>>> not be
>>>>>>> defined but it creates one implicitly because of "transacted
= true".
>>>>>>> 
>>>>>>> The two "flaws" you mentioned are perhaps an issue. It would
be
>> somehow
>>>>>>> calming if it is my project who has a flaw.
>>>>>>> 
>>>>>>> Regards
>>>>>>> Stephan
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Thu, Feb 4, 2016 at 4:44 PM, Quinn Stevenson <
>>>> quinn@pronoia-solutions.com <mailto:quinn@pronoia-solutions.com> <mailto:quinn@pronoia-solutions.com
<mailto:quinn@pronoia-solutions.com>>
>> <mailto:quinn@pronoia-solutions.com <mailto:quinn@pronoia-solutions.com>
<mailto:quinn@pronoia-solutions.com <mailto:quinn@pronoia-solutions.com>>>
>>>>>>>> wrote:
>>>>>>> 
>>>>>>>> I’m still going through the project, but the first couple
of things
>>>> that
>>>>>>>> jump out at me are you have two Spring versions - the one
you
>>>> explicitly
>>>>>>>> put in your POM (3.2.8.RELEASE) and the one pulled in by
>> camel-spring
>>>>>>>> (3.2.11.RELEASE).  Also, camel-spring should be included
in the POM
>>>> since
>>>>>>>> you’re using Spring routes.  I’m not sure if that’s
enough to cause
>>>> issues
>>>>>>>> or not.
>>>>>>>> 
>>>>>>>> I believe what’s going on with the “no-tx” version
is you’re
>> actually
>>>>>>>> using JMS transactions since you still have transacted set
to true
>> in
>>>> the
>>>>>>>> JmsConfiguration.
>>>>>>>> 
>>>>>>>> I’m not sure what’s going in with the “standard”
version - it looks
>>>>>>>> similar to some XA stuff I’ve setup before (because I had
multiple
>>>> brokers
>>>>>>>> involved) except I had to use XA Connection Factories.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Feb 3, 2016, at 3:12 PM, Stephan Burkard <sburkard@gmail.com
<mailto:sburkard@gmail.com>
>> <mailto:sburkard@gmail.com <mailto:sburkard@gmail.com>>
>>>> <mailto:sburkard@gmail.com <mailto:sburkard@gmail.com> <mailto:sburkard@gmail.com
<mailto:sburkard@gmail.com>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Yes, same broker. There is only one ActiveMQ connection
config in
>> the
>>>>>>>>> project.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Wed, Feb 3, 2016 at 8:00 PM, Quinn Stevenson <
>>>>>>>> quinn@pronoia-solutions.com <mailto:quinn@pronoia-solutions.com>
<mailto:quinn@pronoia-solutions.com <mailto:quinn@pronoia-solutions.com>>
>> <mailto:quinn@pronoia-solutions.com <mailto:quinn@pronoia-solutions.com>
<mailto:quinn@pronoia-solutions.com <mailto:quinn@pronoia-solutions.com>>>
>>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Are both the source and destination queues hosted
by the same
>>>> ActiveMQ
>>>>>>>>>> broker?
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On Feb 3, 2016, at 8:21 AM, Stephan Burkard <sburkard@gmail.com
<mailto:sburkard@gmail.com>
>> <mailto:sburkard@gmail.com <mailto:sburkard@gmail.com>>
>>>> <mailto:sburkard@gmail.com <mailto:sburkard@gmail.com> <mailto:sburkard@gmail.com
<mailto:sburkard@gmail.com>>>>
>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Hi
>>>>>>>>>>> 
>>>>>>>>>>> I have built a small Maven project (attached)
to demonstrate a
>> JMS
>>>>>>>>>> transaction problem in Camel routes under certain
load conditions.
>>>> In
>>>>>>>> fact
>>>>>>>>>> I am losing messages between two queues.
>>>>>>>>>>> 
>>>>>>>>>>> The project contains two different flavours of
the same test. One
>>>> of
>>>>>>>>>> them suffers from the problem, the other (due to
my tests) not.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> *** What does the testcase?
>>>>>>>>>>> 1. Produces 1000 messages (100/s) and sends them
to an "input"
>>>> queue.
>>>>>>>>>>> 2. Sends the messages from the "input" queue
to an "output"
>> queue.
>>>>>>>>>>> 3. Finally consumes the messages from the "output"
queue to count
>>>> them.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> *** What is the difference between the two test
flavours?
>>>>>>>>>>> - There is a "standard" flavour that suffers
from the problem
>>>>>>>>>>> - And there is a "noTxManager" flavour that seems
to not have the
>>>>>>>> problem
>>>>>>>>>>> - The "standard" flavour is kind of a well known
Camel/ActiveMQ
>>>>>>>>>> configuration
>>>>>>>>>>> - with a Spring transaction manager
>>>>>>>>>>> - with a Spring transaction policy
>>>>>>>>>>> - With a "transacted" flag in Camel routes
>>>>>>>>>>> - The "noTxManager" flavour is a "simple" configuration
>>>>>>>>>>> - no Spring transaction manager
>>>>>>>>>>> - no Spring transaction policy
>>>>>>>>>>> - no "transacted" flag in Camel routes
>>>>>>>>>>> - BUT: "lazyCreateTransactionManager" = false
(so routes are
>>>>>>>>>> transacted too)
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> *** How to run the testcases?
>>>>>>>>>>> 1. Replace "[yourBrokerHost]" with the hostname
of your ActiveMQ
>>>> broker
>>>>>>>>>>> 2. Run the testcase as JUnit test
>>>>>>>>>>> 3. When you see lots of console messages that
messages are sent,
>>>> stop
>>>>>>>>>> your ActiveMQ broker (do not kill-9 it, just shut
it down
>> normally)
>>>>>>>>>>> 4. Exceptions are thrown on the console output
>>>>>>>>>>> 5. After some seconds start your broker again
>>>>>>>>>>> 6. The test finish normally and after some seconds
dumps a book
>>>> keeping
>>>>>>>>>> on the console
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> *** How to interpret the results?
>>>>>>>>>>> - When the test is successful, no message is
lost. You can run
>> the
>>>> test
>>>>>>>>>> without broker shutdown/startup and it will obviously
always be
>>>>>>>> successful.
>>>>>>>>>>> - When the test fails, one or more messages are
lost between
>> queue
>>>>>>>>>> "input" and "output". In my tests I was not able
to run the
>>>> "standard"
>>>>>>>>>> flavour three times in a row successfully. About
every second run
>>>>>>>> failed.
>>>>>>>>>> In contrast, the "noTxManager" flavour never failed
in my tests.
>>>>>>>>>>> 
>>>>>>>>>>> The book keeping for a failed test looks like
the following. In
>>>> this
>>>>>>>>>> example Message number 281 is arrived at the input
queue but not
>> at
>>>> the
>>>>>>>>>> output queue. So it is lost.
>>>>>>>>>>> 
>>>>>>>>>>> Messages created by Client:          1000
>>>>>>>>>>> Client Exceptions during send:       0 []
>>>>>>>>>>> 
>>>>>>>>>>> Messages received at input queue:    993
>>>>>>>>>>> Missing Messages at input queue:     7
>>>> [282,283,284,285,286,287,288]
>>>>>>>>>>> Duplicate Messages at input queue:   0 []
>>>>>>>>>>> 
>>>>>>>>>>> Messages received at output queue:   992
>>>>>>>>>>> Missing Messages at output queue:    8
>>>>>>>> [281,282,283,284,285,286,287,288]
>>>>>>>>>>> Duplicate Messages at output queue:  0 []
>>>>>>>>>>> 
>>>>>>>>>>> Lost Messages between Queues:        1 [281]
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> *** What is the problem?
>>>>>>>>>>> A Redhat engineer tracked the problem down to
a Spring JMS
>> template
>>>>>>>>>> behaviour that is kind of strange. If a Spring transaction
manager
>>>> is
>>>>>>>>>> defined in the config, it will end up with two of
them. Therefore
>>>> the
>>>>>>>> small
>>>>>>>>>> time range where messages can get lost that arises
only when you
>>>> have a
>>>>>>>>>> certain load.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> *** So, what is my question?
>>>>>>>>>>> - Does this really mean that it is unsafe to
use the "standard"
>>>> flavour
>>>>>>>>>> of configuration?
>>>>>>>>>>> - Is there another config with TxManager etc
that works
>> correctly?
>>>>>>>>>>> - What are limits of the "noTxManager" config?
When is it not
>>>>>>>> sufficent?
>>>>>>>>>>> 
>>>>>>>>>>> Regards
>>>>>>>>>>> Stephan
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> <CamelAmqTxTest.zip>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message