Return-Path: Delivered-To: apmail-qpid-users-archive@www.apache.org Received: (qmail 18235 invoked from network); 6 Nov 2009 14:11:09 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 6 Nov 2009 14:11:09 -0000 Received: (qmail 56581 invoked by uid 500); 6 Nov 2009 14:11:09 -0000 Delivered-To: apmail-qpid-users-archive@qpid.apache.org Received: (qmail 56561 invoked by uid 500); 6 Nov 2009 14:11:08 -0000 Mailing-List: contact users-help@qpid.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@qpid.apache.org Delivered-To: mailing list users@qpid.apache.org Received: (qmail 56551 invoked by uid 99); 6 Nov 2009 14:11:08 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Nov 2009 14:11:08 +0000 X-ASF-Spam-Status: No, hits=-5.1 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_MED X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of tross@redhat.com designates 209.132.183.28 as permitted sender) Received: from [209.132.183.28] (HELO mx1.redhat.com) (209.132.183.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Nov 2009 14:11:06 +0000 Received: from int-mx01.intmail.prod.int.phx2.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id nA6EAJra019665 for ; Fri, 6 Nov 2009 09:10:42 -0500 Received: from dhcp-100-18-254.bos.redhat.com (dhcp-100-18-254.bos.redhat.com [10.16.18.254]) by int-mx01.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id nA6EAIb0024727 for ; Fri, 6 Nov 2009 09:10:19 -0500 Message-ID: <4AF42E4D.9010909@redhat.com> Date: Fri, 06 Nov 2009 09:10:21 -0500 From: Ted Ross User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.4pre) Gecko/20091014 Fedora/3.0-2.8.b4.fc11 Thunderbird/3.0b4 MIME-Version: 1.0 To: users@qpid.apache.org Subject: Re: Network Outage Causes Message Loss on Federated Routes References: <951EAAA951E3AD4B82B5B30AA2A0AF2BDC967B7A64@Commitchs1.commitent.com> In-Reply-To: <951EAAA951E3AD4B82B5B30AA2A0AF2BDC967B7A64@Commitchs1.commitent.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.67 on 10.5.11.11 On 11/05/2009 11:55 AM, Cullen Davis wrote: > Our Qpid based product will be deployed with brokers being federated brokers networks that are characterized as "disconnected, interrupted, and low-bandwidth". We are using a dedicated hardware based network shaper to simulate the network conditions in order to test our solution. > > Qpid is performing very well in the tests involving high bit error rates, packet loss, and high latencies. However, our solution is not meeting threshold objectives in tests involving extended network outages (packet loss = 100%). > > Our solution utilizes Qpid 0.5 C++ brokers and clients running on RedHat Enterprise Linux 5.4. The brokers are utilizing direct exchanges and have been federate as follows: > qpid-route --durable dynamic add brokerB brokerA fed.direct > > The qpid-route command created a new queue, named "bridge-queue" at brokerA. The new queue had queue properties of durable=False, exclusive=True and autoDelete=True. > > Our test begin with 1000 messages being published into broker A at a rate of 1 per second. The network connection between broker A and broker B is set to run at 56kbps for 5 minutes and then degrade to a network outage stage (100% packet loss) for 15 minutes. > > The test begins and broker B starts receiving the messages through the federated route at a frequency of 1 per second. About seven minutes into the network outage stage, broker A throws a timeout error: > > Connection timed out: closing > DISCONNECTED 150.nnn.nnn.nnn (broker B's ip) > > This results in the bridge-queue on broker A being deleted. When the network connection is re-established, the bridge-queue is rebuilt, but none of the messages that were published into Broker A during the network outage were federated to broker B. Essentially, this means that broker B never receives more than half of the messages received by broker A. > > The current theory is that the federated route is backed by a bridge-queue with a autoDelete property of true. When the network outage occurs, the queue is deleted and the message counts are flushed. The durable flag on the route causes the bridge-queue to be rebuilt when the brokers reconnect, but there is no way for the bridge-queue to establish what messages have not been federated. Could setting the autoDelete property fix the problem? I am unsure of how to properly set this property on a "system management" queue. > > Any thoughts on how to properly configure a broker link / route that can survive extended network outages would be greatly appreciated. > > Cullen J. Davis > CommIT Enterprises, Inc. > > --------------------------------------------------------------------- > Apache Qpid - AMQP Messaging Implementation > Project: http://qpid.apache.org > Use/Interact: mailto:users-subscribe@qpid.apache.org > > Your theory is correct. An "exchange" route causes a temporary transit queue to be created to hold messages waiting to be sent from broker to broker. Even though the route is durable, meaning it will be re-established after a broker restart, the temporary queue is not (it is exclusive/auto-delete) and any messages in the queue when a restart occurs will be lost. You can use a "queue" route where rather than connecting to a remote exchange, the destination broker subscribes to an existing queue. This queue can be non exclusive and durable. Be sure to use the --ack N option in qpid-route where N is a number greater than zero. This will cause the inter-broker route to use message acknowledgement in such a way that recovery will be clean (i.e. the source broker will not discard messages from the queue until they are acknowledged by the destination broker). The downside of the queue route solution is that you don't get the dynamic binding behavior. It is possible (though not implemented) to use durable transit queues when durable routes are created so that no messages would ever be lost in the event of broker failure. -Ted --------------------------------------------------------------------- Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:users-subscribe@qpid.apache.org