qpid-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Conway <acon...@redhat.com>
Subject Re: proton Messenger error handling/recovery REQUEST FEEDBACK!
Date Fri, 05 Sep 2014 13:17:36 GMT
On Thu, 2014-09-04 at 18:28 +0100, Fraser Adams wrote:
> On 03/09/14 23:29, Alan Conway wrote:
> > On Wed, 2014-09-03 at 20:05 +0100, Fraser Adams wrote:
> >> Hello,
> >> I've probably missed something, but I don't know how to reliably detect
> >> failures and reconnect.
> >>
> >> So if I sent to an address with a freshly stood up Messenger instance
> >> and the address can't be found things aren't too bad and I wind up with
> >> an ECONNREFUSED that I could do something with, however if I've been
> >> sending messages to a valid address then I kill off the consumer I see a:
> >>
> >> [0x513380]:ERROR amqp:connection:framing-error connection aborted
> >> [0x513380]:ERROR[-2] connection aborted
> >>
> >> CONNECTION ERROR connection aborted (remote)
> >>
> >> The thing is that all of these are *internally* generated messages sent
> >> to the console via fprintf, so my *application* doesn't really know
> >> about them (though I could be crafty and interpose my own cheeky fprintf
> >> to intercept them). That doesn't quite sound like the desired behaviour
> >> for a robust system?
> >>
> >>
> >> Similarly should I actually trap an error what's the correct way to
> >> continue, as it happens currently my app carries on silently doing
> >> nothing useful and continuing to do so even when the peer restarts (so
> >> there is no magic internal reconnection logic as far as I can see).
> >>
> >> do I have to do a
> >> messenger.stop()
> >> messenger.start()
> >>
> >> cycle to get things going again, I'm guessing so, but I'll like to know
> >> what the "correct"/expected way to create Messenger code that is robust
> >> against remote failures, as far as I can see there are no examples of
> >> that sort of thing?
> > I've come up against similar problems, I think it's an area that needs
> > some work in Proton. Is anybody already working on/thinking about this
> > area?
> >
> > Cheers,
> > Alan.
> >
> I'd definitely like to know how others deal with this sort of thing.

I cheat. I've been using proton in dispatch system tests, I come up
against these issues when I start up some proton/dispatch network and
try to use it too quickly before things have settled down. I have some
tweaks in my test harness to wait till things are ready so there are no
errors :) That's not a solution for general non-test situations -
although knowing how to wait till things are ready is always useful.

https://svn.apache.org/repos/asf/qpid/dispatch/trunk/tests/system_test.py

class Messenger adds a "flush" method that pumps the Messenger event
loop till there is no more work to do. Otherwise subscribe() in
particular gives no way to tell when the subscription is active.

Note: My situation is a bit special in that dispatch creates addresses
dynamically on subscribe and my tests involve slow stuff like waypoints
to brokers etc. That introduces a delay in subscribe that probably isn't
visible when the address is created beforehand. 

There's also Qpidd.wait_ready and Qdrouterd.wait_ready that wait for
qpidd and dispatch router to be ready respectively so I can be sure that
when I connect with proton they'll be listening. Those wait for the
expected listening ports to be connectable and in the case of dispatch
also does a qmf check to make sure that all expected outgoing connectors
are there. 		 

> 
> For info notwithstanding not necessarily being able to trap all the 
> errors without being devious around fprintf  (which to be fair works, 
> but it's a bit sneaky and if you have multiple Messenger instances won't 
> tell you which one the error relates to) but when I do get an error I 
> appear to have to start from scratch - in other words:
> 
> message.free();
> messenger.free();
> message = new proton.Message();
> messenger = new proton.Messenger();
> messenger.start();
> 
> If I try to restart the original messenger or use existing queue I get 
> no joy. It's not the end of the world but I've no idea what robust 
> Messenger code is *supposed* to look like.
> 
> Presumably Alan and I aren't the only people who might like to be able 
> to trap errors and restart? Or does every one else write code that never 
> fails ;->

I always wondered how everybody but me can do that. Sigh. For you and me
I think we need to do some work on proton's error handling. 

- proton (or any library!) should NEVER EVER write anything direct to
stdout or stderr. It needs a (very simple) logging facility that can
write to stderr by default but can be redirected elsewhere.
- proton should never log an error without also returning some useful
error condition to the application. 

Proton has some useful pn_error_* functions, they just need to be used
more widely. In dispatch I introduced an errno-style thread-local error
code/message (in proton it would be a pn_error_t*) That allows sensible
error messages out of functions that want to return something else (e.g.
pointer or null and set the thread error) It also allows you to work
around lazy error handling (temporarily of course (hahahaha)) - a caller
couple of stack frames up can detect an error even if intermediate
functions didn't check & propagate errors properly. I'm not advocating
lazy error checking but in C it is hard to get everything.

FEEDBACK PLEASE: anyone think this is a great/horrible idea? Does proton
already do things I've missed that would make this unnecessary?

Cheers,
Alan.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org


Mime
View raw message