qpid-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fraser Adams <fraser.ad...@blueyonder.co.uk>
Subject Re: C++ broker memory leak in federated set-up???
Date Thu, 15 Mar 2012 17:27:36 GMT

> Just btw, you can mark the queue as durable without making the message 
> persistent, in which case there would be no performance penalty.
Thanks, yeah I realise that, but you have to explicitly mark the 
messages as not persistent, certainly for JMS the spec says that should 
be the default and I *thought* it was the default for C++ clients too. 
it's something I'm going to look into but the producers aren't directly 
under my control so I didn't want to have a dependency at that stage in 
the project, it's less of an issue now as the folks in charge of that 
aspect are pretty onside. However TBH I *really* wish the queue 
configuration persistence aspects and the message persistence aspects 
were separately configurable.

Just to make you laugh we also just got bitten with another comedy 
moment, we hadn't realised there was a bug in the 0.8 python messaging 
API implementation that transparently makes queues durable by default... 
We set up a logging client on another part of our system with a circular 
queue of decent size and last weekend our operational system got 
totalled. What happened was the consumer client died and because the 
queue was persistent (unbeknown to us) it had the default journal stuff, 
so hit the threshold exceed on the journal size rather than on the queue 
byte size - that blew all our federated links.

Monday wasn't a good day :-( if I didn't laugh I'd cry.....fortunately 
that was why queue routes were a good choice and most of the data had 
backed up on the queues the other side of the link.

>
>> So it's not all that complicated, but it's driving me nuts that when the
>> source broker is co-located with the producer we have a memory increase,
>> but when we host the source broker on a different box it seems to be 
>> fine.
>
> That is very weird and my top suspicion would be that there were 
> different versions of qpid on the two boxes. Can you run colocated on 
> the 'other' box to rule that in our out? Or have you verified that 
> they are using the same version of qpid?
Well interestingly that was originally the case, the producer and source 
broker were running 0.6 and the destination broker 0.8 I managed to 
persuade the folks in charge of the producer system to do a build with 0.8.

One of the things I need to try is to stand up two instances of the 
producer and see if I get a problem if each writes to the others' broker.

As it happens when they moved to 0.8 the throughput improved a lot even 
before we fixed the network madness that forced the producer NIC into 
acting as 10 base/T half duplex :-( However with 0.8 brokers all round 
and a sane network we get great message throughput but we still see the 
memory growing but it's a lot less pronounced than before, but as I say 
if we point the producer to a broker on a different host it seems to be 
stable.

I *assume* that there are no sneaky little optimisations going on under 
the hood when client and broker are located on the same host, like 
"let's do some memory mapping and bypass the TCP/IP stack" :-) I'm 
guessing not but stranger things have happened.....

>
> Its a RHEL6 only issue relating to memory allocators in glibc. If both 
> boxes are 5.4 we can rule that out.
That's useful to know, though to be honest it's a pity at least it might 
preserve my sanity. So presumably this really is only RHEL6? what 
version of glibc are we talking about, is it one that doesn't actually 
run on 5.4. The reason I ask is that the producer is a high performance 
near real time system so they *just may* have got some up to date 
versions of things installed. They probably don't use it for any whizzy 
allocation as they use their own memory pooling mechanism to improve 
multithreaded performance.

That has me thinking I'll check with them that they don't do any fancy 
LD_PRELOAD stuff to override the underlying allocator in a way that may 
affect other processes. I'm fairly sure they don't but it's worth checking.

> To be honest I'm stumped, I'm afraid and can only offer some 
> suggestions on what I might do to search for any further clues...
>
> Just to confirm, you have run qpid-stat -c, qpid-stat -q and qpid-stat 
> -u against a bloating broker? And everything shown there is as 
> expected (not much queue depth, message counts correlating, no 
> unexpected activity)?
It all *looks* as I'd expect. Clearly when we had network problems the 
queue on the route was filling up and eventually circled round but now 
the depth floats around one or two items.

>
> When the memory growth starts happening, if you delete and recreate 
> the bridge does that have any effect on growth?
That's not something we've tried it's worth looking into, of course 
Murphy's law has generally kicked in and made the problem most often 
happen at night :-/ I'm wondering if it's bad karma and I did something 
awful in a past life :->

>
> Is it reproducible at all with more detailed logging (ideally 
> --log-enable info+ --log-enable trace+:amqp_0_10)? Obviously logs like 
> that grow pretty quickly so depending on the scale of the leak that 
> may not be feasible. It might give some clues though (then again it 
> might not :-(). Perhaps even a short run from both co-located and 
> remote cases to see if a comparison shows anything up?
>
It's worth a try.


I'm still suspicious of acks as it's the only thing that I've ever seen 
cause qpidd to bloat in an obvious way, but as I say we're using default 
route config. so these should be unreliable. I guess under the hood this 
*really does* use an unreliable link and not just some reasonable number 
for N when acknowledging?

Thanks for for the pointers, even though I'm not much closer to a 
solution it's nice to have people adding their thoughts it's going to be 
something really obtuse in the end. I appreciate the moral support!!

Cheers,
Frase





---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org


Mime
View raw message