cxf-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christian Schneider (JIRA)" <>
Subject [jira] [Closed] (DOSGI-173) unregistering an exported service does not remove it from zookeeper (and remote clients)
Date Thu, 27 Jun 2013 17:13:26 GMT


Christian Schneider closed DOSGI-173.

> unregistering an exported service does not remove it from zookeeper (and remote clients)
> ----------------------------------------------------------------------------------------
>                 Key: DOSGI-173
>                 URL:
>             Project: CXF Distributed OSGi
>          Issue Type: Bug
>    Affects Versions: 1.5.0
>            Reporter: Amichai Rothman
>            Assignee: Amichai Rothman
>             Fix For: 1.5.0
>         Attachments: fix_zk_unregisteration.diff
> I have some bundles exporting and consuming services, running on two hosts. I've noticed
more than once that while stopping and starting different bundles on the two hosts (just playing
around with them manually to see how robust the distributed system is), at some point one
of the hosts doesn't see that a service it was using from the other host is down. Connecting
to ZooKeeper directly, I see the node for that service is still there, i.e. the service was
not properly removed from ZK even though the bundle is stopped and service is gone.
> Investigating this is a bit tricky, since it involves various trackers, endpoint listeners
and service listeners and there is not enough code documentation to understand what the intended
flow is... however I've found a few interesting related findings that may point at the solution:
> 1. Following the logs and some debugging, it appears that the problem is not with the
discovery.zookeeper package/bundle itself, since the endpoint removed event never gets there.
> 2. In EndpointListenerNotifier.notifyListenersOfRemoval(), the EndpointDescription appears
to be null, so there is never a filter match and the endpointRemoved callback is never triggered
on the EndpointListeners. This is because all of the ExportRegistrations are already closed
by the time they get there. It seems that the premature closing is done by the service tracker
created in ExportRegistrationImpl.startServiceTracker(). My guess is that the order in which
the service tracker and service listener (in TopologyManagerExport, which triggers the EndpointListenerNotifier)
receive the events is arbitrary depending on some race condition somewhere, which may explain
why this is an inconsistently reproducible bug. I would like to say that the solution is to
get rid of the service tracker altogether (it doesn't do anything else, and as a separate
bug, is never closed), but I'm not sure why it was introduced in the first place or if there
are any other scenarios in which it was necessary, so I really don't know what the proper
solution should be.
> 3. Another element that may have been masking this bug to some degree is the local discovery
bundle which was running, and during debugging I saw it triggering some EndpointListener removal
events which were picked up by the other components. I'm not entirely sure yet of what this
bundle does (I didn't find any mention of it on the website, and didn't get to the code yet),
but I just leave this bundle in the stopped state for now, with no visible effects on the
testing, making debugging easier.
> 4. An additional related issue which bugged me during a previous code review was that
InterfaceMonitorManager.addInterest() is closing and recreating an InterfaceMonitor every
time it is invoked with an existing scope, even though the old and new IMs monitor the same
ZK node and are practically identical - so why not just leave the old monitor running? This
replacement causes a bunch of unnecessary extra work (including several ZK server accesses),
a flurry of unnecessary filter-matching logs, and and unnecessary gap in monitoring for ZK
changes. This also relates to the bug at hand since InterfaceMonitor.close() also sends some
EndpointListener notifications about the endpoints being removed, which leaves some gaps in
the registration coverage (before they are re-added moments later) and might interact in some
other unpredictable (at least to me) way with the rest of the mechanism. It seems these IM
close/start cycles sometimes occur tens of times in a row.
> To sum it up, there's definitely a bug occurring. When I tested a bit with fixes for
both potential causes above (IM stop/start replaced with a single start the first time a given
scope is encountered, and close invocation in service tracker removed) - I could no longer
recreate the bug, but I don't understand all the component interactions well enough to know
if there are any side effects, or why they were implemented this way in the first place (I
tend to assume there was a good reason for it which I'm unaware of).

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message