qpid-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Broadstone <mbroa...@gmail.com>
Subject Re: spurious ABRT in qpidd
Date Mon, 11 Jan 2016 21:29:07 GMT
On Mon, Jan 11, 2016 at 3:47 PM, Matt Broadstone <mbroadst@gmail.com> wrote:

> On Mon, Jan 11, 2016 at 1:18 PM, Matt Broadstone <mbroadst@gmail.com>
> wrote:
>> On Mon, Jan 11, 2016 at 1:15 PM, Gordon Sim <gsim@redhat.com> wrote:
>>> On 01/11/2016 05:37 PM, Matt Broadstone wrote:
>>>> I'm having trouble tracking down the root cause of a thrown SIGABRT in
>>>> qpidd, and was wondering for some advice from the list.  Specifically,
>>>> it
>>>> seems to be after a period of little to no activity, a large burst of
>>>> traffic hits the broker and the only information we're seeing in the
>>>> logs
>>>> is:
>>>> Jan 11 15:58:33 test-box kernel: [  652.903997] init: qpidd main process
>>>> (2239) killed by ABRT signal
>>>> Jan 11 15:58:33 test-box kernel: [  652.911661] init: qpidd main process
>>>> ended, respawning
>>>> We're running ubuntu 14.04 (trusty) on this machine, with the packages
>>>> off
>>>> the official qpid PPA. I tried running the services with trace logging
>>>> enabled to no avail (there were no strange packets, and no error
>>>> messages
>>>> about bad assertions). Attaching gdb to the process also resulted in no
>>>> relevant information, so I'm running out of ideas of what to try next.
>>>> AFAICT the only `abort()` present in the codebase is in the assertion
>>>> code,
>>>> which would print something about around the assertion failure.
>>>> Any thoughts on what I might try to help resolve this issue?
>>> Could it be a memory issue? I.e. the qpidd processes exceeding some
>>> memory limit and being killed by the oom killer?
>> I thought so at first too, but I believe we would see a kernel message
>> related to that if that were the case, not to mention that the server had
>> something like 120GB of free RAM at the time as well. I'm quite willing to
>> test that theory more thoroughly if you have a recommended means of doing
>> so?
> I wish I could tell you that I have a quick reproducible test case that
> only used qpid code, unfortunately we don't it's always in concert with a
> number of other services. Is it possible that this could be triggered from
> some proton code and that no message would be output? We're actually
> experiencing this problem on a loop on one of our servers currently and
> there's something like 452GB free memory, so I'm inclined to rule out
> memory as the root cause.
Looks like I did just get a valid assertion, but I'm not even clear anymore
whether this is related to the initially reported bug (since we're trying
to throw the kitchen sink at it to cause ABRTs). The log indicated:

2016-01-11 21:21:21 [Broker] debug clean(): 0 messages remain; head is now 0
2016-01-11 21:21:21 [Broker] trace qpid.
handling outgoing delivery for 0x7f3fdc262590 on session 0x7f3fdc111650
virtual void qpid::broker::amqp::OutgoingFromQueue::handle(pn_delivery_t*):
Assertion `r.delivery == delivery' failed.

So in the current test we are back on 0.34 built with proton 0.10. This is
the initial configuration that caused us pain, but we had been testing
against trunk (0.35) with proton 0.11.1 the past few days, which was
exhibiting the same initial stated problem. Hope that helps.

>> Matt
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>>> For additional commands, e-mail: users-help@qpid.apache.org

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message