felix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pierre De Rop <pierre.de...@gmail.com>
Subject Re: [DS] time for 1.8 release?
Date Tue, 29 Oct 2013 15:46:14 GMT
yes David, consistently. but I don't see stacktraces.

Now, looking at the code, I see that this stacktrace is logged in DEBUG:

    final void dumpThreads()
    {
        try
        {
            String dump = new ThreadDump().call();
            log( LogService.LOG_DEBUG, dump, null );

So, I will activate DEBUG and provide the stacktrace.

But is this a bug to log this stacktrace in DEBUG ?
I mean: should it be logged in WARN, instead of DEBUG ?

/pierre


On Tue, Oct 29, 2013 at 4:28 PM, David Jencks <david_jencks@yahoo.com>wrote:

> Hi Pierre,
>
> I think there should be a thread dump in the log to go with this message.
>  If there is could you send it to me or attach it to FELIX-4297?
>
> Does this happen consistently?
>
> many thanks
> david jencks
>
> On Oct 29, 2013, at 3:43 AM, Pierre De Rop <pierre.derop@gmail.com> wrote:
>
> > Hi David;
> >
> > 1) Every integration tests are passing OK.
> >
> > Regarding the OOM, I believe that the out of memory is likely to be cause
> > by the following load tests, which are running in DEBUG mode:
> >
> >
> src/test/java/org/apache/felix/scr/integration/ComponentConcurrencyTest.java
> > src/test/java/org/apache/felix/scr/integration/Felix3680Test.java
> > src/test/java/org/apache/felix/scr/integration/Felix3680_2Test.java
> >
> >
> > So, I just added this in each files (in static initializers):
> >
> >        DS_LOGLEVEL = "warn";
> >
> > And I have no more OOM, with this nice test result message:
> >
> > Tests run: 103, Failures: 0, Errors: 0, Skipped: 0
> > [INFO] BUILD SUCCESS
> >
> >
> >
> > 2) Now, I did again a non regression tests with our applications, and I'm
> > seeing a new WARN message (but it does not seem to prevent the
> application
> > to be correctly initialized: all components are in the end satisfied):
> >
> > (if you believe that it's now time to release, then you can go ahead, and
> > we'll then investigate this new issue after the release ... I let you
> > decide)
> >
> > ->
> >
> > 2013-10-29 11:10:02,487 CM Event Dispatcher (Fire ConfigurationEvent:
> > pid=com.alcatel.as.ioh.impl.server.ServerFactoryImpl) ERROR osgi  -
> > [com.alcatel.as.ioh.impl.server.ServerFactoryImpl(91)] DependencyManager
> :
> > invokeBindMethod : timeout on open latch newtcpprocessor
> >
> >
> >
> >
> > kind regards;
> > /Pierre
> >
> >
> > On Tue, Oct 29, 2013 at 12:13 AM, David Jencks <david_jencks@yahoo.com
> >wrote:
> >
> >> Pierre,
> >>
> >> I opened https://issues.apache.org/jira/browse/FELIX-4297 and fixed the
> >> problems I found (for 2).  I don't see the OOM often enough to have any
> >> confidence that anything I do would actually fix it, so I'm inclined to
> do
> >> nothing.  Is that OK with you?
> >>
> >> Unless you can find some more problems :-)  I'm planning to try another
> >> release when the config admin 1.8 gets to maven central.  I'm going to
> >> update the pom to normally run against the CA 1.8 version supporting R5
> and
> >> change the profile so running against R4 requires specifying profiles
> >> explicitly.
> >>
> >> thanks again!
> >> david jencks
> >>
> >> On Oct 28, 2013, at 12:24 AM, David Jencks <david_jencks@yahoo.com>
> wrote:
> >>
> >>> Hi Pierre,
> >>>
> >>> Much better to find these problems before a release than just after!
> >>>
> >>> I saw an OOM once recently but haven't been able to reproduce it.
> >>>
> >>> I'm looking into the NPE.  I think I see the timing hole it is using
> but
> >> need to think about it some more.
> >>>
> >>> many thanks!
> >>> david jencks
> >>>
> >>> On Oct 27, 2013, at 2:58 AM, Pierre De Rop <pierre.derop@gmail.com>
> >> wrote:
> >>>
> >>>> Hi David,
> >>>>
> >>>> Looking at our configurator component we are currently using (but we
> >> will fix it in order to use the multi-location "?"), I see this:
> >>>>
> >>>> void configure(String pid, Dictionary pidConf) {
> >>>>    Configuration config = getConfiguration(_pid, null);
> >>>>    if (config.getBundleLocation() != null) {
> >>>>        config.setBundleLocation(null);
> >>>>    }
> >>>>    config.update(pidConf);
> >>>> }
> >>>>
> >>>> So I believe that you are getting a null configuration because there
> is
> >> a short window between the setBundleLocation(null) (at this point, the
> >> configuration is null) and the config.update(pidConf) call ...
> >>>>
> >>>> So, the good news is that I'm not having anymore some NPE using your
> >> latest commits :-) and I think our application is now fully operational.
> >>>>
> >>>> but ... (please don't start to abominate me  ) now, in order to do a
> >> final check, I restarted the integration tests and there is still two
> >> problems:
> >>>>
> >>>> 1) I'm sometimes getting some out of memory errors: this is probably
> >> caused by the ComponentConcurrencyTest/Felix3680Test tests, which are
> >> currently configured in DEBUG mode ?
> >>>>
> >>>> 2) I ran the tests two times, and the second time, I got this
> exception
> >> with the failing
> >>>> Felix3680_2Test:
> >>>>
> >>>>
> >>
> test_concurrent_injection_with_bundleContext(org.apache.felix.scr.integration.Felix3680_2Test)
> >> Time elapsed: 36.597 sec  <<< ERROR!
> >>>> java.lang.NullPointerException
> >>>>       at
> >>
> org.apache.felix.scr.impl.manager.DependencyManager.invokeUnbindMethod(DependencyManager.java:1710)
> >>>>       at
> >>
> org.apache.felix.scr.impl.manager.SingleComponentManager.invokeUnbindMethod(SingleComponentManager.java:387)
> >>>>       at
> >>
> org.apache.felix.scr.impl.manager.DependencyManager$MultipleDynamicCustomizer.removedService(DependencyManager.java:355)
> >>>>       at
> >>
> org.apache.felix.scr.impl.manager.DependencyManager$MultipleDynamicCustomizer.removedService(DependencyManager.java:290)
> >>>>       at
> >>
> org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.customizerRemoved(ServiceTracker.java:1503)
> >>>>       at
> >>
> org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.customizerRemoved(ServiceTracker.java:1398)
> >>>>       at
> >>
> org.apache.felix.scr.impl.manager.ServiceTracker$AbstractTracked.untrack(ServiceTracker.java:1258)
> >>>>       at
> >>
> org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.serviceChanged(ServiceTracker.java:1437)
> >>>>       at
> >>
> org.apache.felix.framework.util.EventDispatcher.invokeServiceListenerCallback(EventDispatcher.java:932)
> >>>>       at
> >>
> org.apache.felix.framework.util.EventDispatcher.fireEventImmediately(EventDispatcher.java:793)
> >>>>       at
> >>
> org.apache.felix.framework.util.EventDispatcher.fireServiceEvent(EventDispatcher.java:543)
> >>>>       at
> >> org.apache.felix.framework.Felix.fireServiceEvent(Felix.java:4260)
> >>>>       at org.apache.felix.framework.Felix.access$000(Felix.java:74)
> >>>>       at
> >> org.apache.felix.framework.Felix$1.serviceChanged(Felix.java:390)
> >>>>       at
> >>
> org.apache.felix.framework.ServiceRegistry.unregisterService(ServiceRegistry.java:148)
> >>>>       at
> >>
> org.apache.felix.framework.ServiceRegistrationImpl.unregister(ServiceRegistrationImpl.java:127)
> >>>>       at
> >>
> org.apache.felix.scr.integration.components.felix3680_2.Main$RegistrationHelper$2.run(Main.java:136)
> >>>>       at
> >>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >>>>       at
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >>>>       at java.lang.Thread.run(Thread.java:722)
> >>>>
> >>>> Are you also getting this exception ?
> >>>>
> >>>> thanks
> >>>>
> >>>> /Pierre
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Sat, Oct 26, 2013 at 6:34 PM, David Jencks <david_jencks@yahoo.com
> >
> >> wrote:
> >>>> Hi PIerre,
> >>>>
> >>>> Looking at the CA spec it looks like CA is supposed to send out
> >> CM_LOCATION_CHANGED events even before any properties are set when
> >> setBundleLocation is called.  I added some code to ignore these events.
> >> Note that DS is "reserving" the configurations for (one of) the
> >> component(s) that will be consuming them by calling
> getConfiguration(pid).
> >>>>
> >>>> I do wonder how the location to something non-null on your
> >> configurations before the properties are set.
> >>>>
> >>>> Waiting for the next bug :-)
> >>>>
> >>>> thanks
> >>>> david jencks
> >>>>
> >>>> On Oct 26, 2013, at 3:00 AM, Pierre De Rop <pierre.derop@gmail.com>
> >> wrote:
> >>>>
> >>>>> Hello David,
> >>>>>
> >>>>> The code we are using to configure our components is old, at at
the
> >> time we wrote it, configadmin was not supporting multi-location. But I
> do
> >> agree, we can now use the "?" multi-location.
> >>>>>
> >>>>> Now, I'm sorry but I'm still seeing another NPE (sometimes, not
> >> always):
> >>>>>
> >>>>> 2013-10-26 11:45:44,209 CM Event Dispatcher (Fire ConfigurationEvent:
> >> pid=sipagent) ERROR osgi  - [43] Unexpected problem delivering
> >> configuration event to [org.osgi.service.cm.ConfigurationListener,
> id=102,
> >>
> bundle=341/reference:file:/home/nxuser/pp/bundles/custo/org.apache.felix.scr.jar]
> >>>>>
> >>>>> java.lang.NullPointerException
> >>>>>       at
> >>
> org.apache.felix.scr.impl.manager.ComponentFactoryImpl.getProperties(ComponentFactoryImpl.java:226)
> >>>>>       at
> >>
> org.apache.felix.scr.impl.manager.ComponentFactoryImpl.configurationUpdated(ComponentFactoryImpl.java:396)
> >>>>>       at
> >>
> org.apache.felix.scr.impl.config.ConfigurationSupport.configurationEvent(ConfigurationSupport.java:344)
> >>>>>       at
> >>
> org.apache.felix.cm.impl.ConfigurationManager$FireConfigurationEvent.sendEvent(ConfigurationManager.java:2032)
> >>>>>       at
> >>
> org.apache.felix.cm.impl.ConfigurationManager$FireConfigurationEvent.run(ConfigurationManager.java:2002)
> >>>>>       at
> >> org.apache.felix.cm.impl.UpdateThread.run(UpdateThread.java:103)
> >>>>>       at java.lang.Thread.run(Thread.java:722)
> >>>>>
> >>>>>
> >>>>> I'm not sure, but it seems that ConfigAdmin is providing a null
> >> dictionary, when delivering a CM_LOCATION_CHANGED event ? if correct,
> then
> >> Is this a normal behavior ?
> >>>>>
> >>>>> This is strange; perhaps I shall start a new integration test ?
> >>>>>
> >>>>> /Pierre
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Sat, Oct 26, 2013 at 9:54 AM, David Jencks <
> david_jencks@yahoo.com>
> >> wrote:
> >>>>> Hi Pierre,
> >>>>>
> >>>>> This pointed out a logic error I introduced for Felix 3651.  I opened
> >> https://issues.apache.org/jira/browse/FELIX-4293 and fixed the error I
> >> found which I think explains the NPE.  Could you check this?
> >>>>>
> >>>>> Could I ask what you are trying to do by setting the bundleLocation
> to
> >> null?  If you want to allow any bundle to receive the configuration you
> >> could use multi-location support and set the location to "?"  With the
> code
> >> you have now, if the configuration is already in use by a DS component,
> the
> >> location changed event will result in the bundle location being reset
> back
> >> to what it was.
> >>>>>
> >>>>> thanks!
> >>>>> david jencks
> >>>>> On Oct 25, 2013, at 8:32 AM, Pierre De Rop <pierre.derop@gmail.com>
> >> wrote:
> >>>>>
> >>>>>> Hi David,
> >>>>>>
> >>>>>> thanks; The fix is fixing the problem :-)
> >>>>>>
> >>>>>> but ... there's now a new different problem: i'm now sometimes
> >> getting this
> >>>>>> NPE, after SCR is receiving a CM_LOCATION_CHANGED event:
> >>>>>>
> >>>>>> 2013-10-25 16:11:44,674 CM Event Dispatcher (Fire
> ConfigurationEvent:
> >>>>>> pid=sipagent) ERROR osgi  - [43] Unexpected problem delivering
> >>>>>> configuration event to [org.osgi.service.cm.ConfigurationListener,
> >> id=102,
> >>>>>>
> >>
> bundle=341/reference:file:/home/nxuser/pp/bundles/custo/org.apache.felix.scr.jar]
> >>>>>>
> >>>>>> java.lang.NullPointerException
> >>>>>>     at
> >>>>>>
> >>
> org.apache.felix.scr.impl.manager.ComponentFactoryImpl.getProperties(ComponentFactoryImpl.java:226)
> >>>>>>     at
> >>>>>>
> >>
> org.apache.felix.scr.impl.manager.ComponentFactoryImpl.configurationUpdated(ComponentFactoryImpl.java:396)
> >>>>>>     at
> >>>>>>
> >>
> org.apache.felix.scr.impl.config.ConfigurationSupport.configurationEvent(ConfigurationSupport.java:390)
> >>>>>>     at
> >>>>>>
> >>
> org.apache.felix.cm.impl.ConfigurationManager$FireConfigurationEvent.sendEvent(ConfigurationManager.java:2032)
> >>>>>>     at
> >>>>>>
> >>
> org.apache.felix.cm.impl.ConfigurationManager$FireConfigurationEvent.run(ConfigurationManager.java:2002)
> >>>>>>     at
> >> org.apache.felix.cm.impl.UpdateThread.run(UpdateThread.java:103)
> >>>>>>     at java.lang.Thread.run(Thread.java:722)
> >>>>>>
> >>>>>> Perhaps a new jira issue shall be opened ?
> >>>>>>
> >>>>>> I think we are getting a CM_LOCATION_CHANGED event because in
our
> >>>>>> application, we populate configuration admin by doing something
like
> >> this:
> >>>>>>
> >>>>>> Configuration cfg = cm.getConfiguration(pid, null)
> >>>>>> if (config.getBundleLocation() != null) {
> >>>>>>   config.setBundleLocation(null);
> >>>>>> }
> >>>>>>
> >>>>>> The setBundleLocation(null) is probably useless, but this leads
to a
> >>>>>> CM_LOCATION_CHANGED event, which then sometimes ends up with
the
> NPE.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Friday, October 25, 2013, David Jencks <david_jencks@yahoo.com>
> >> wrote:
> >>>>>>> Hi Pierre,
> >>>>>>>
> >>>>>>> You are so good at writing useful tests!!
> >>>>>>>
> >>>>>>> I found a place to call setTargets(getProperties()) from
inside
> >>>>>> ComponentFactoryImpl that would have fewer side effects.  Could
you
> >> see if
> >>>>>> this makes your actual applications work properly?  I'm uploading
a
> >>>>>> snapshot.
> >>>>>>>
> >>>>>>> many thanks
> >>>>>>> david jencks
> >>>>>>>
> >>>>>>> On Oct 24, 2013, at 6:17 AM, Pierre De Rop <pierre.derop@gmail.com
> >
> >> wrote:
> >>>>>>>
> >>>>>>>> Hi David,
> >>>>>>>>
> >>>>>>>> Since this application is complex, I'm not able to provide
logs
> >> because
> >>>>>>>> there are hundreds of components involved which are
not mine, and
> >> for
> >>>>>> now,
> >>>>>>>> I'm not able to diagnose the problem.
> >>>>>>>>
> >>>>>>>> But I have created FELIX-4290, and joined to it an integration
> test
> >> which
> >>>>>>>> seems to reproduce the kind of problem I think I'm having
in my
> >>>>>>>> application. I also joined the proposed patch.
> >>>>>>>>
> >>>>>>>> I did not have time to test the patch you suggested
regarding the
> >>>>>>>> SingleComponentManager.reconfigure method, so let's
continue to
> >>>>>> investigate
> >>>>>>>> using the jira issue and the test I attached to it.
> >>>>>>>>
> >>>>>>>> Thanks;
> >>>>>>>>
> >>>>>>>> /Pierre
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Thu, Oct 24, 2013 at 12:27 AM, David Jencks <
> >> david_jencks@yahoo.com
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi Pierre,
> >>>>>>>>>
> >>>>>>>>> I believe you that this code path doesn't work :-)
> >>>>>>>>>
> >>>>>>>>> I think there should be a less invasive way to fix
this.  By any
> >> chance
> >>>>>>>>> can you get a debug-enabled log from when this problem
occurs?
>  It
> >> would
> >>>>>>>>> help confirm my suspicions of what might be missing.
> >>>>>>>>>
> >>>>>>>>> FWIW I suspect SingleComponentManager.reconfigure
is missing a
> >> check for
> >>>>>>>>> m_factoryProperties here (line 561):
> >>>>>>>>>
> >>>>>>>>>         // nothing to do if there is no configuration
(see
> >> FELIX-714)
> >>>>>>>>>         if ( configuration == null && m_configurationProperties
> ==
> >>>>>>>>> null )
> >>>>>>>>>         {
> >>>>>>>>>             log( LogService.LOG_DEBUG, "No configuration
provided
> >> (or
> >>>>>>>>> deleted), nothing to do", null );
> >>>>>>>>>             return;
> >>>>>>>>>         }
> >>>>>>>>>
> >>>>>>>>> Unless we can't figure anything out for sure I'd
prefer to fix
> this
> >>>>>> before
> >>>>>>>>> the release.
> >>>>>>>>>
> >>>>>>>>> thanks
> >>>>>>>>> david jencks
> >>>>>>>>>
> >>>>>>>>> On Oct 23, 2013, at 3:09 PM, Pierre De Rop <
> pierre.derop@gmail.com
> >>>
> >>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi David,
> >>>>>>>>>>
> >>>>>>>>>> (sorry to do all this noise while you are releasing
...)
> >>>>>>>>>>
> >>>>>>>>>> We are indeed using factory components; and
today, I finally
> >> found and
> >>>>>>>>>> fixed a cycle, using the Apache Service Diagnostic
tool; and I'm
> >> going
> >>>>>>>>>> further on but now I'm facing another problem
which I did not
> >> have in
> >>>>>> the
> >>>>>>>>>> scr 1.6.2.
> >>>>>>>>>>
> >>>>>>>>>> So, I would like to discuss about this new problem
with you
> >> before you
> >>>>>>>>> redo
> >>>>>>>>>> a release, in order to decide if this problem
(if there is
> really
> >> one
> >>>>>> ?)
> >>>>>>>>>> shall be addressed now or after the upcoming
release ?
> >>>>>>>>>>
> >>>>>>>>>> So, in our application, we are extensively using
factory
> >> components
> >>>>>>>>>> (@Component(factory=XXX")).
> >>>>>>>>>> When we instantiate a factory component (using
> >>>>>>>>>> ComponentFactory.newInstance()), We pass to
the newInstance()
> >> method
> >>>>>> some
> >>>>>>>>>> additional component properties which may also
contain some
> target
> >>>>>>>>> filters.
> >>>>>>>>>>
> >>>>>>>>>> This allows to dynamically configure the filter
of some
> References
> >>>>>>>>> declared
> >>>>>>>>>> in the factory component.
> >>>>>>>>>> in the scr 1.6.2, this mechanism was working
fine. But using
> >> trunk,
> >>>>>> this
> >>>>>>>>>> does not work all the time. Some target filters
seem to be
> >> correctly
> >>>>>>>>>> configured, and some others are not (I'm not
sure, actually,
> it's
> >> late
> >>>>>>>>> ...).
> >>>>>>>>>>
> >>>>>>>>>> So, it looks like sometimes, some target filters
are not updated
> >> before
> >>>>>>>>>> activating components ? or factory components
?
> >>>>>>>>>>
> >>>>>>>>>> I'm not sure but this might be related to the
old FELIX-3726.
> >>>>>>>>>> Now, interestingly, I did the following patch
and my application
> >> is now
> >>>>>>>>>> working fine: In the  AbstractComponentManager
class, I
> >> systematically
> >>>>>>>>>> update target filters, like this:
> >>>>>>>>>>
> >>>>>>>>>> +++
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>
> >>
> src/main/java/org/apache/felix/scr/impl/manager/AbstractComponentManager.java
> >>>>>>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message