qpid-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chuck Rolke (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DISPATCH-1110) Intermittent router hang while running QIT's AMQP large content test
Date Thu, 04 Oct 2018 19:57:00 GMT

    [ https://issues.apache.org/jira/browse/DISPATCH-1110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16638770#comment-16638770

Chuck Rolke commented on DISPATCH-1110:

qpid-interop-test (qit) is printing the error message to stderr on the sender when this error
occurs. Unfortunately due to a defect in qit implementation a user at the test console never
sees it. Qit throws away Sender and Receiver output if the process hangs and must be killed.
An improvement for qit would be to print stdout and stderr from the Sender and Receiver even
if those processes hang. The trick I pulled to get the error message to show up without messing
with qit internals is to wait for the process to hang and then kill it by pid.

The message I get is:
amqp_large_content_test::Sender::on_connection_error: amqp:session:invalid-field: sequencing
error, expected delivery-id 6, got 5

Knowing this the next step is to figure out from where the error is coming.

> Intermittent router hang while running QIT's AMQP large content test
> --------------------------------------------------------------------
>                 Key: DISPATCH-1110
>                 URL: https://issues.apache.org/jira/browse/DISPATCH-1110
>             Project: Qpid Dispatch
>          Issue Type: Bug
>         Environment: Standard QIT environment.
> Once QIT is built and installed, the environment is set using the config.sh file. See
QUICKSTART for details.
>            Reporter: Kim van der Riet
>            Assignee: Ganesh Murthy
>            Priority: Major
>         Attachments: qdrouterd.conf
> When running the Qpid Interop Test's AMQP large content test, a stand-alone router will
intermittently hang and cause the test to time out.
> The failure appears to be limited to either the AMQP list or map types, and usually with
the C++ client as the message sender.  The C++, Python2 and Python3 as receiver clients
have all seen this failure, but the Python2 receiver client seems to reproduce more readily
on my hardware.
> In all cases, the test fails when the router sends what I suppose is the final transfer
of a large message (I have not added up/counted the bytes of the many preceding transfers)
to the consumer. The consumer then sends a disposition, but the router does not respond again
until the test times out. The consumer can be seen to send heartbeats to the router, but the
router does not send any of its own.
> {noformat}
> ... (plenty of 65550-sized frames R->C)
> R->C 5976	3.454766	::1	::1	AMQP	65550
> R->C 5977	3.454775	::1	::1	AMQP	65550
> R->C 5978	3.454783	::1	::1	AMQP	48171
> C->R 5982	3.529881	::1	::1	AMQP	115	disposition
> C->R 5984	7.530704	::1	::1	AMQP	94	(empty)
> C->R 5986	11.532306	::1	::1	AMQP	94	(empty)
> ...{noformat}
> There are no errors to be seen in the router logs other than when the consuming client
is killed owing to the test timeout.
> {noformat}
> ...
> 2018-08-29 12:50:23.191754 -0400 SERVER (info) [14]: Accepted connection to ::1:amqp
from ::1:37262
> 2018-08-29 12:51:19.562695 -0400 SERVER (info) [14]: Connection from ::1:37262 (to ::1:amqp)
failed: amqp:connection:framing-error connection aborted
> {noformat}
> The reproducer is not very tight on this, and the error occurs about 50% of the time
on my hardware.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org

View raw message