qpid-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gordon Sim <g...@redhat.com>
Subject Re: System stalling
Date Tue, 17 Sep 2013 10:43:47 GMT
On 09/12/2013 03:39 PM, Jimmy Jones wrote:
>>> Hi,
>>> I've finally managed to isolate the issue and can reproduce it with the attached
scripts. Running rx-test.pl followed by tx-test.pl results in a system where the receiver
can keep up with the producer (gets a message every <1s) (tx-test 118% CPU, qpidd 97% CPU,
rx-test 60% CPU). However, if you stop rx-test and restart it (even after only a second or
so), it starts to take 2s+ to receive messages, going up to about 6s on my system, so the
ring quickly fills and overflows. Even if the producer is then stopped, messages are still
only received every 3s - with qpidd on 100% CPU and the receiver on 5%. Also the resident
size of qpidd reaches 5GB, yet the queue is only 2GB.
>>> Hopefully I can now regain my sanity :)
>> Well done! Unfortunately your scripts seem to have been stripped off at
>> some stage. Could you attach them to a JIRA perhaps? This was with 0.22,
>> right?
> Created QPID-5135.

Thanks, Jimmy. I have been looking into this issue a little more. I 
couldn't exactly duplicate your numbers as my test machine did not have 
sufficient memory but I believe I have identified the key symptom (JIRA 
updated accordingly), though as yet not the root cause.

As noted in the JIRA, it may be possible to tune your receivers to 
mitigate the issue. How feasible that is probably depends on how closely 
your real system follows the test scenario in the JIRA. For large 
messages, reducing the capacity seems to be the most effective 
improvement. As message size decreases, acknowledging in larger batches 
becomes more effective.

One other question was just to confirm that the case as reported does 
match your real system. Initially there was a suspicion that the ingest 
process was blocked on send which would I think would be a different issue.

I'll do some more digging on what the root cause from the drop in 
throughput for large messages on a full ring queue might be and update 
the JIRA with any progress.

To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org

View raw message