qpid-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gordon Sim <g...@redhat.com>
Subject Re: problem with qpid heartbeats when sending msgs with size over 1KB
Date Wed, 07 Dec 2011 14:37:27 GMT
On 12/07/2011 01:55 AM, Tom M wrote:
> Hello,
>
> we are having a problem with our MRG (qpid) system:
>
> * when sending messages with size of 1600bytes, a connection (used for
> sending from client) does not detect the host connection is lost via
> heartbeat timeout.
>
> + we are using C++ qpid client 0.7 and qpidd 0.7 (linux 2.6 x86_64 on both
> client and broker hosts)
>
> and Ethernet connection (TCP/IP) between hosts
>
>      + for this connection we have: ConnectionSettings
> connectionSettings.heartbeat = 8
>
>      + simulating a system failure by pulling the ethernet cable to the
> broker host
>
>      + the connection close Exception is caught by the client after many
> minutes (6 to 20mins), I'm guessing this is due to the TCP timeout and not
> the missed heartbeats.
>
>      + with the same exact application (for our client), if sending messages
> of 200bytes, we do get the qpid exception indicating the Connection closed
> (catch TransportFailure Exception: connection closed) within 16 seconds.
> For this testing, there were no other changes between the 2 cases, other
> than the size of the messages sent from the client (only expanded the size
> of the string in the body of the message) (1 message sent per second in
> both cases).
>
> * is this a known problem with qpid 0.7?

No, i don't think this is a known issue.

> * is there patch to fix this for qpid 0.7?
>
> * has this problem already been fixed in later releases?
>
> NOTE: we have already deployed qpid 0.7 in our system, and we will not be
> able to upgrade to a newer full release for many months.
>
> I'm wondering if the problem is that the connection gets blocked with the
> first TCP packet of a multiple packet message, such that the heartbeat
> detection is disabled until the full message is sent. But, if the
> multi-packet message can not complete (since socket is broken), the
> heartbeat logic is held disabled until the multi-packet message can
> complete (which in this case it can not).

There is nothing that directly (intentionally) does anything like this. 
However it may be possible that there is some deadlock or liveness issue 
that prevents correct function in some cases.

Is the test always failing with the larger message size? There is 
actually no difference in the AMQP framing for a 200 byte v a 1600 byte 
message. It may just be that the different timing of the larger write 
somehow triggers the issue.

Can you get trace level logs and a thread dump from the client for a 
failed case?

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org


Mime
View raw message