activemq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rob Davies (JIRA)" <j...@apache.org>
Subject [jira] Updated: (AMQ-2317) Duplicate messages with transacted persistent messages during JDBC Master/Slave failover
Date Fri, 04 Sep 2009 18:59:21 GMT

     [ https://issues.apache.org/activemq/browse/AMQ-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Rob Davies updated AMQ-2317:
----------------------------

    Fix Version/s: 5.4.0

> Duplicate messages with transacted persistent messages during JDBC Master/Slave failover
> ----------------------------------------------------------------------------------------
>
>                 Key: AMQ-2317
>                 URL: https://issues.apache.org/activemq/browse/AMQ-2317
>             Project: ActiveMQ
>          Issue Type: Bug
>    Affects Versions: 5.3.0
>         Environment: OS: MacOS X  10.5.7 MacBook Core 2 Duo 2 Ghz
> DBMS: MySQL 5.0.83 (through macports), SQLServer 2005 (in VMWare), other suspected but
not thouroughly tested (including HSQL)
> All observations are against trunk: rev 790957 (2009-07-03 23:07:04 +0700 (Fri, 03 Jul
2009)) (fuse progress 5.3.0.3 and ActiveMQ 5.2.0 seem to have the same problem though)
>            Reporter: Daniel Mueller
>            Priority: Critical
>             Fix For: 5.4.0
>
>         Attachments: FailoverTransactionalTest.patch
>
>
> There is a race condition somewhere in the transaction/replay code involving failovers
of JDBC only Master/Slave configurations.
> Observed problems:
> If messages are sent to a master broker in one transaction, and during the time of the
transaction the master fails over to the slave, then the messages seem to be replayed twice
(both database holds duplicates (see query at the end) and the broker answer with message
count containing duplicates).
> Severity: 
> If the clients are connected to the new master and start consuming, the broker will not
deliver dups. The dups will be delivered though, if there is another failover (a common case
for system upgrades). It seems like a single consumer will not get duplicates, even if it
fails over again to new broker, but if the consumer is restarted, it loses it's state as well,
and subsequently gets the duplicates delivered.
> Attached is a testcase that demonstrates the problem. It shows that with a single producer
doing commits after each send, it creates on additional message in the broker with a duplicate
MSGID_SEQ. If everything is committed in one transaction, then every single message in the
transaction is duplicated (and not only the ones before the failover occurred).
> The testcase uses an external MySQL instance though, and needs the DBCP and the MySQL
JDBC connector on the classpath (the pom is patched in the attached file to resolve that automatically).
> Out of the 6 tests, the following almost always fail on my machine:
> testProducer_MasterFailoverByShutdown_AtRandomTimes_CommitPerMessage  (expected <6000>,
but was <6001>)
> testProducer_MasterFailoverByShutdown_AtRandomTimes_OneCommit  (expected <6000>,
but was <12000>)
> Rarely (3-5% of the cases) this one also fails:
> testProducer_MasterFailoverByShutdown_SingleMsgCommit_AfterCommit  (expected <500>,
but was <501>)
> Other observations made:
> 1) The problem seems to be a race condition because while trying to find the cause through
debugging, the problem disapeared when setting a break point in TransactionInfo.visit(line:100).
The race condition is met on my machine (specs above) basically all the time without interaction
(from maven, on the shell with a build, inside eclipse debugged and normal).
> 2) It seems that TransactionBroker.commitTransaction(line:100) is called once with duplicated
synchronizations (2x size). On the other hand MemoryTransactionStore$Tx(line:109) is called
twice with the correct amount first, and later a doubled amount.
> 3) The problem is not reproducible with Kaha, the problem is related to JDBC.
> 4) It might be possible to have the testcase fail reliably with one of Derby/HSQL/H2,
but I didn't investigate.
> 5) The testcase is not exactly very pretty, but it does show the problem ;)
> 6) The attached testcase is a patch against activemq-core.
> 7) The tests can be executed directly (in bash) with:
> env MAVEN_OPTS="$MAVEN_OPTS -Xmx800M" mvn -Dtest=org.apache.activemq.transport.failover.FailoverTransactionalTest
test
> 8) For MySQL the following should work: 
> SELECT 
>       MSGID_PROD
>      ,MSGID_SEQ
>   FROM activemq_msgs
> GROUP BY MSGID_PROD,MSGID_SEQ
> HAVING ( COUNT(MSGID_SEQ) > 1 );
> 9) if you need the my.cnf for the database, I can attach that as well.
> 10) The tables are correctly created as InnoDB
> I think that's it...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message