qpid-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cullen Davis <cullen.da...@commitent.com>
Subject Network Outage Causes Message Loss on Federated Routes
Date Thu, 05 Nov 2009 16:55:32 GMT
Our Qpid based product will be deployed with brokers being federated brokers networks that
are characterized as "disconnected, interrupted, and low-bandwidth".   We are using a dedicated
hardware based network shaper to simulate the network conditions in order to test our solution.

Qpid is performing very well in the tests involving high bit error rates, packet loss, and
high latencies.  However, our solution is not meeting threshold objectives in tests involving
extended network outages (packet loss = 100%).

Our solution utilizes Qpid 0.5 C++ brokers and clients running on RedHat Enterprise Linux
5.4.  The brokers are utilizing direct exchanges and have been federate as follows:
  qpid-route  --durable dynamic add brokerB brokerA fed.direct

The qpid-route command created a new queue, named "bridge-queue" at brokerA.  The new queue
had queue properties of durable=False, exclusive=True and autoDelete=True.  

Our test begin with 1000 messages being published into broker A at a rate of 1 per second.
 The network connection between broker A and broker B is set to run at 56kbps for 5 minutes
and then degrade to a network outage stage (100% packet loss) for 15 minutes.  

The test begins and broker B starts receiving the messages through the federated route at
a frequency of 1 per second.  About seven minutes into the network outage stage, broker A
throws a timeout error:
  Connection timed out: closing
  DISCONNECTED 150.nnn.nnn.nnn (broker B's ip)

This results in the bridge-queue on broker A being deleted.  When the network connection is
re-established, the bridge-queue is rebuilt, but none of the messages that were published
into Broker A during the network outage were federated to broker B.  Essentially, this means
that broker B never receives more than half of the messages received by broker A.

The current theory is that the federated route is backed by a bridge-queue with a autoDelete
property of true.  When the network outage occurs, the queue is deleted and the message counts
are flushed.  The durable flag on the route causes the bridge-queue to be rebuilt when the
brokers reconnect, but there is no way for the bridge-queue to establish what messages have
not been federated.  Could setting the autoDelete property fix the problem?   I am unsure
of how to properly set this property on a "system management" queue.

Any thoughts on how to properly configure a broker link / route that can survive extended
network outages would be greatly appreciated.

Cullen J. Davis
CommIT Enterprises, Inc.

Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org

View raw message