felix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexander Klimetschek (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (FELIX-5410) Web console plugin for troubleshooting wiring issues
Date Wed, 30 Nov 2016 21:21:58 GMT

    [ https://issues.apache.org/jira/browse/FELIX-5410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15709807#comment-15709807
] 

Alexander Klimetschek edited comment on FELIX-5410 at 11/30/16 9:21 PM:
------------------------------------------------------------------------

To track the *origin of dynamically registered services*, a [ServiceListener|https://osgi.org/javadoc/r6/core/org/osgi/framework/ServiceListener.html]
could be used. It would track the (last) dynamic unregistration of services and inspect the
stack which looks something like this:

{noformat}
listener: at org.apache.felix.scr.impl.BundleComponentActivator$ListenerInfo.serviceChanged(BundleComponentActivator.java:120)
          at org.apache.felix.framework.util.EventDispatcher.invokeServiceListenerCallback(EventDispatcher.java:991)
          at org.apache.felix.framework.util.EventDispatcher.fireEventImmediately(EventDispatcher.java:839)
          at org.apache.felix.framework.util.EventDispatcher.fireServiceEvent(EventDispatcher.java:546)
          at org.apache.felix.framework.Felix.fireServiceEvent(Felix.java:4557)
          at org.apache.felix.framework.Felix.access$000(Felix.java:106)
          at org.apache.felix.framework.Felix$1.serviceChanged(Felix.java:420)
          at org.apache.felix.framework.ServiceRegistry.unregisterService(ServiceRegistry.java:170)
          at org.apache.felix.framework.ServiceRegistrationImpl.unregister(ServiceRegistrationImpl.java:144)
origin:   at org.apache.jackrabbit.oak.plugins.metric.StatisticsProviderFactory.deactivate(StatisticsProviderFactory.java:103)
{noformat}

This would be stored in a map of service -> origin (class name).

In contrast, a registration by SCR has a stacktrace where the origin is {{org.apache.felix.scr}}:

{noformat}
listnr: at org.apache.felix.scr.impl.BundleComponentActivator$ListenerInfo.serviceChanged(BundleComponentActivator.java:120)
        at org.apache.felix.framework.util.EventDispatcher.invokeServiceListenerCallback(EventDispatcher.java:991)
        at org.apache.felix.framework.util.EventDispatcher.fireEventImmediately(EventDispatcher.java:839)
        at org.apache.felix.framework.util.EventDispatcher.fireServiceEvent(EventDispatcher.java:546)
        at org.apache.felix.framework.Felix.fireServiceEvent(Felix.java:4557)
        at org.apache.felix.framework.Felix.access$000(Felix.java:106)
        at org.apache.felix.framework.Felix$1.serviceChanged(Felix.java:420)
        at org.apache.felix.framework.ServiceRegistry.unregisterService(ServiceRegistry.java:170)
        at org.apache.felix.framework.ServiceRegistrationImpl.unregister(ServiceRegistrationImpl.java:144)
scr:    at org.apache.felix.scr.impl.manager.AbstractComponentManager$3.unregister(AbstractComponentManager.java:883)
        at org.apache.felix.scr.impl.manager.AbstractComponentManager$3.unregister(AbstractComponentManager.java:857)
        at org.apache.felix.scr.impl.manager.RegistrationManager.changeRegistration(RegistrationManager.java:140)
        at org.apache.felix.scr.impl.manager.AbstractComponentManager.unregisterService(AbstractComponentManager.java:925)
{noformat}

In that case the origin would probably be the service implementation itself (which might fail
to start because of an exception in its activate).

While this is implementation specific (bound to Felix & requires knowing it's internal
package names), for a troubleshooting tool this is ok. It can be adapted for newer Felix versions
where things might change.

With the origin class/package is known, at troubleshooting time it can check on the bundle's
state and possibly grep the error log file for any messages from that class or package and
provide as hints.

In our Sling based application, the JCR repository (database) is registered dynamically, and
most of the application bundles depend on it directly or indirectly. It's startup can be prone
to various low level exceptions (persistence problems, configuration issues), which prevent
the dynamic registration. However, the exception message easily gets lost in the error log
as usually there is a lot more going on when the repository restarts. A troubleshooting tool
that can find this automatically (i.e. without knowing about the specific service names) would
be useful.

The question is if getting the stacktrace for each service unregistration might be too costly.
See http://stackoverflow.com/questions/2347828/how-expensive-is-thread-getstacktrace


was (Author: alexander.klimetschek):
To track the *origin of dynamically registered services*, a [ServiceListener|https://osgi.org/javadoc/r6/core/org/osgi/framework/ServiceListener.html]
could be used. It would track the (last) dynamic unregistration of services and inspect the
stack which looks something like this:

{noformat}
listener: at org.apache.felix.scr.impl.BundleComponentActivator$ListenerInfo.serviceChanged(BundleComponentActivator.java:120)
          at org.apache.felix.framework.util.EventDispatcher.invokeServiceListenerCallback(EventDispatcher.java:991)
          at org.apache.felix.framework.util.EventDispatcher.fireEventImmediately(EventDispatcher.java:839)
          at org.apache.felix.framework.util.EventDispatcher.fireServiceEvent(EventDispatcher.java:546)
          at org.apache.felix.framework.Felix.fireServiceEvent(Felix.java:4557)
          at org.apache.felix.framework.Felix.access$000(Felix.java:106)
          at org.apache.felix.framework.Felix$1.serviceChanged(Felix.java:420)
          at org.apache.felix.framework.ServiceRegistry.unregisterService(ServiceRegistry.java:170)
          at org.apache.felix.framework.ServiceRegistrationImpl.unregister(ServiceRegistrationImpl.java:144)
origin:   at org.apache.jackrabbit.oak.plugins.metric.StatisticsProviderFactory.deactivate(StatisticsProviderFactory.java:103)
{noformat}

This would be stored in a map of service -> origin (class name).

In contrast, a registration by SCR has a stacktrace where the origin is {{org.apache.felix.scr}}:

{noformat}
listnr: at org.apache.felix.scr.impl.BundleComponentActivator$ListenerInfo.serviceChanged(BundleComponentActivator.java:120)
        at org.apache.felix.framework.util.EventDispatcher.invokeServiceListenerCallback(EventDispatcher.java:991)
        at org.apache.felix.framework.util.EventDispatcher.fireEventImmediately(EventDispatcher.java:839)
        at org.apache.felix.framework.util.EventDispatcher.fireServiceEvent(EventDispatcher.java:546)
        at org.apache.felix.framework.Felix.fireServiceEvent(Felix.java:4557)
        at org.apache.felix.framework.Felix.access$000(Felix.java:106)
        at org.apache.felix.framework.Felix$1.serviceChanged(Felix.java:420)
        at org.apache.felix.framework.ServiceRegistry.unregisterService(ServiceRegistry.java:170)
        at org.apache.felix.framework.ServiceRegistrationImpl.unregister(ServiceRegistrationImpl.java:144)
scr:    at org.apache.felix.scr.impl.manager.AbstractComponentManager$3.unregister(AbstractComponentManager.java:883)
        at org.apache.felix.scr.impl.manager.AbstractComponentManager$3.unregister(AbstractComponentManager.java:857)
        at org.apache.felix.scr.impl.manager.RegistrationManager.changeRegistration(RegistrationManager.java:140)
        at org.apache.felix.scr.impl.manager.AbstractComponentManager.unregisterService(AbstractComponentManager.java:925)
{noformat}

In that case the origin would probably be the service implementation itself (which might fail
to start because of an exception in its activate).

While this is implementation specific (bound to Felix & requires knowing it's internal
package names), for a troubleshooting tool this is ok. It can be adapted for newer Felix versions
where things might change.

With the origin class/package is known (e.g. {{org.apache.jackrabbit.oak.plugins.metric.StatisticsProviderFactory}}
in this case), at troubleshooting time it can check on the bundle's state and possibly grep
the error log file for any messages from that class or package and provide as hints.

In our Sling based application, the JCR repository (database) is registered dynamically, and
most of the application bundles depend on it directly or indirectly. It's startup can be prone
to various low level exceptions (persistence problems, configuration issues), which prevent
the dynamic registration. However, the exception message easily gets lost in the error log
as usually there is a lot more going on when the repository restarts. A troubleshooting tool
that can find this automatically (i.e. without knowing about the specific service names) would
be useful.

The question is if getting the stacktrace for each service unregistration might be too costly.
See http://stackoverflow.com/questions/2347828/how-expensive-is-thread-getstacktrace

> Web console plugin for troubleshooting wiring issues
> ----------------------------------------------------
>
>                 Key: FELIX-5410
>                 URL: https://issues.apache.org/jira/browse/FELIX-5410
>             Project: Felix
>          Issue Type: New Feature
>          Components: Web Console
>            Reporter: Alexander Klimetschek
>         Attachments: FELIX-5410-with-services.patch, FELIX-5410.patch, webconsole-troubleshoot-services.png,
webconsole-troubleshoot.png
>
>
> h4. Feature
> Add a new view/plugin to the standard webconsole that helps to pin point which bundles,
services or components are the true source for inactive bundles or services.
> * For *bundles* the underlying assumption would be a healthy system with all bundles
active, and thus any inactive can be shown and analyzed as being problematic.
> * For *services/components* one can look at inactive _immediate_ services that fail because
of unsatisfied references. For others, the user might need to enter the "problematic" service
or component they expect to be running to start the analysis.
> h4. Motivation
> In a larger OSGi application with many bundles and components, it can be difficult to
find out the root cause why certain bundles do not start or why a service is not active, especially
for folks new to OSGi or with limited knowledge about the application. I have seen many people
fail, and thus "not like" OSGi because of such hurdles during development, where it is easy
to update on bundle but miss out on crucial dependencies.
> Figuring out is possible through the current web console, but only for experts, if you
click through the bundle or service details. This is usually tedious work, if for example
a lower level bundle is the problem, and 200 others are not active because of it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message