qpid-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From acon...@apache.org
Subject svn commit: r1646618 - in /qpid/trunk/qpid/cpp/src/qpid: broker/AsyncCompletion.h ha/Primary.cpp
Date Fri, 19 Dec 2014 03:18:57 GMT
Author: aconway
Date: Fri Dec 19 03:18:57 2014
New Revision: 1646618

URL: http://svn.apache.org/r1646618
QPID-6278: HA broker abort in TXN soak test

The crash appears to be a race condition in async completion exposed by the HA
TX code code as follows:

1. Message received and placed on tx-replication queue, completion delayed till backups ack.
   Completion count goes up for each backup then down as each backup acks.
2. Prepare received, message placed on primary's local persistent queue.
   Completion count goes up one then down one for local store completion (null store in this

The race is something like this:
- last backup ack arrives (on backup IO thread) and drops completion count to 0.
- prepare arrives (on client thread) null store bumps count to 1 and immediately drops to
- both threads try to invoke the completion callback, one deletes it while the other is still

The old completion logic assumed that only one thread can see the atomic counter
go to 0.  It does not handle the count going to 0 in one thread and concurrently
being increased and decreased back to 0 in another. This case is introduced by
HA transactions because the same message is put onto a tx-replication queue and
then put again onto another persistent local queue, so there are two cycles of

The new logic fixes this only one call to completion callback is possible in all cases.

Also fixed missing lock in ha/Primary.cpp.


Modified: qpid/trunk/qpid/cpp/src/qpid/broker/AsyncCompletion.h
URL: http://svn.apache.org/viewvc/qpid/trunk/qpid/cpp/src/qpid/broker/AsyncCompletion.h?rev=1646618&r1=1646617&r2=1646618&view=diff
--- qpid/trunk/qpid/cpp/src/qpid/broker/AsyncCompletion.h (original)
+++ qpid/trunk/qpid/cpp/src/qpid/broker/AsyncCompletion.h Fri Dec 19 03:18:57 2014
@@ -111,13 +111,14 @@ class AsyncCompletion : public virtual R
         qpid::sys::Mutex::ScopedLock l(callbackLock);
         if (active) {
             if (callback.get()) {
+                boost::intrusive_ptr<Callback> save = callback;
+                callback = boost::intrusive_ptr<Callback>(); // Nobody else can run
                 inCallback = true;
                     qpid::sys::Mutex::ScopedUnlock ul(callbackLock);
-                    callback->completed(sync);
+                    save->completed(sync);
                 inCallback = false;
-                callback = boost::intrusive_ptr<Callback>();
             active = false;

Modified: qpid/trunk/qpid/cpp/src/qpid/ha/Primary.cpp
URL: http://svn.apache.org/viewvc/qpid/trunk/qpid/cpp/src/qpid/ha/Primary.cpp?rev=1646618&r1=1646617&r2=1646618&view=diff
--- qpid/trunk/qpid/cpp/src/qpid/ha/Primary.cpp (original)
+++ qpid/trunk/qpid/cpp/src/qpid/ha/Primary.cpp Fri Dec 19 03:18:57 2014
@@ -482,6 +482,7 @@ shared_ptr<PrimaryTxObserver> Primary::m
     shared_ptr<PrimaryTxObserver> observer =
         PrimaryTxObserver::create(*this, haBroker, txBuffer);
+    sys::Mutex::ScopedLock l(lock);
     txMap[observer->getTxQueue()->getName()] = observer;
     return observer;

To unsubscribe, e-mail: commits-unsubscribe@qpid.apache.org
For additional commands, e-mail: commits-help@qpid.apache.org

View raw message