felix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Jencks <david_jen...@yahoo.com>
Subject Re: [DS] time for 1.8 release?
Date Tue, 29 Oct 2013 17:40:09 GMT
HI Pierre,

Do you by any chance see a log message at ERROR level like

Cannot create component instance due to failure to bind reference {0}

shortly before the timeout message?

I'm not sure if it's a good idea to dump the threads at WARN.  If you think it is I'll change
it.  Some people get upset if the logs are too long :-)

thanks
david jencks


On Oct 29, 2013, at 8:46 AM, Pierre De Rop <pierre.derop@gmail.com> wrote:

> yes David, consistently. but I don't see stacktraces.
> 
> Now, looking at the code, I see that this stacktrace is logged in DEBUG:
> 
>    final void dumpThreads()
>    {
>        try
>        {
>            String dump = new ThreadDump().call();
>            log( LogService.LOG_DEBUG, dump, null );
> 
> So, I will activate DEBUG and provide the stacktrace.
> 
> But is this a bug to log this stacktrace in DEBUG ?
> I mean: should it be logged in WARN, instead of DEBUG ?
> 
> /pierre
> 
> 
> On Tue, Oct 29, 2013 at 4:28 PM, David Jencks <david_jencks@yahoo.com>wrote:
> 
>> Hi Pierre,
>> 
>> I think there should be a thread dump in the log to go with this message.
>> If there is could you send it to me or attach it to FELIX-4297?
>> 
>> Does this happen consistently?
>> 
>> many thanks
>> david jencks
>> 
>> On Oct 29, 2013, at 3:43 AM, Pierre De Rop <pierre.derop@gmail.com> wrote:
>> 
>>> Hi David;
>>> 
>>> 1) Every integration tests are passing OK.
>>> 
>>> Regarding the OOM, I believe that the out of memory is likely to be cause
>>> by the following load tests, which are running in DEBUG mode:
>>> 
>>> 
>> src/test/java/org/apache/felix/scr/integration/ComponentConcurrencyTest.java
>>> src/test/java/org/apache/felix/scr/integration/Felix3680Test.java
>>> src/test/java/org/apache/felix/scr/integration/Felix3680_2Test.java
>>> 
>>> 
>>> So, I just added this in each files (in static initializers):
>>> 
>>>       DS_LOGLEVEL = "warn";
>>> 
>>> And I have no more OOM, with this nice test result message:
>>> 
>>> Tests run: 103, Failures: 0, Errors: 0, Skipped: 0
>>> [INFO] BUILD SUCCESS
>>> 
>>> 
>>> 
>>> 2) Now, I did again a non regression tests with our applications, and I'm
>>> seeing a new WARN message (but it does not seem to prevent the
>> application
>>> to be correctly initialized: all components are in the end satisfied):
>>> 
>>> (if you believe that it's now time to release, then you can go ahead, and
>>> we'll then investigate this new issue after the release ... I let you
>>> decide)
>>> 
>>> ->
>>> 
>>> 2013-10-29 11:10:02,487 CM Event Dispatcher (Fire ConfigurationEvent:
>>> pid=com.alcatel.as.ioh.impl.server.ServerFactoryImpl) ERROR osgi  -
>>> [com.alcatel.as.ioh.impl.server.ServerFactoryImpl(91)] DependencyManager
>> :
>>> invokeBindMethod : timeout on open latch newtcpprocessor
>>> 
>>> 
>>> 
>>> 
>>> kind regards;
>>> /Pierre
>>> 
>>> 
>>> On Tue, Oct 29, 2013 at 12:13 AM, David Jencks <david_jencks@yahoo.com
>>> wrote:
>>> 
>>>> Pierre,
>>>> 
>>>> I opened https://issues.apache.org/jira/browse/FELIX-4297 and fixed the
>>>> problems I found (for 2).  I don't see the OOM often enough to have any
>>>> confidence that anything I do would actually fix it, so I'm inclined to
>> do
>>>> nothing.  Is that OK with you?
>>>> 
>>>> Unless you can find some more problems :-)  I'm planning to try another
>>>> release when the config admin 1.8 gets to maven central.  I'm going to
>>>> update the pom to normally run against the CA 1.8 version supporting R5
>> and
>>>> change the profile so running against R4 requires specifying profiles
>>>> explicitly.
>>>> 
>>>> thanks again!
>>>> david jencks
>>>> 
>>>> On Oct 28, 2013, at 12:24 AM, David Jencks <david_jencks@yahoo.com>
>> wrote:
>>>> 
>>>>> Hi Pierre,
>>>>> 
>>>>> Much better to find these problems before a release than just after!
>>>>> 
>>>>> I saw an OOM once recently but haven't been able to reproduce it.
>>>>> 
>>>>> I'm looking into the NPE.  I think I see the timing hole it is using
>> but
>>>> need to think about it some more.
>>>>> 
>>>>> many thanks!
>>>>> david jencks
>>>>> 
>>>>> On Oct 27, 2013, at 2:58 AM, Pierre De Rop <pierre.derop@gmail.com>
>>>> wrote:
>>>>> 
>>>>>> Hi David,
>>>>>> 
>>>>>> Looking at our configurator component we are currently using (but
we
>>>> will fix it in order to use the multi-location "?"), I see this:
>>>>>> 
>>>>>> void configure(String pid, Dictionary pidConf) {
>>>>>>   Configuration config = getConfiguration(_pid, null);
>>>>>>   if (config.getBundleLocation() != null) {
>>>>>>       config.setBundleLocation(null);
>>>>>>   }
>>>>>>   config.update(pidConf);
>>>>>> }
>>>>>> 
>>>>>> So I believe that you are getting a null configuration because there
>> is
>>>> a short window between the setBundleLocation(null) (at this point, the
>>>> configuration is null) and the config.update(pidConf) call ...
>>>>>> 
>>>>>> So, the good news is that I'm not having anymore some NPE using your
>>>> latest commits :-) and I think our application is now fully operational.
>>>>>> 
>>>>>> but ... (please don't start to abominate me  ) now, in order to do
a
>>>> final check, I restarted the integration tests and there is still two
>>>> problems:
>>>>>> 
>>>>>> 1) I'm sometimes getting some out of memory errors: this is probably
>>>> caused by the ComponentConcurrencyTest/Felix3680Test tests, which are
>>>> currently configured in DEBUG mode ?
>>>>>> 
>>>>>> 2) I ran the tests two times, and the second time, I got this
>> exception
>>>> with the failing
>>>>>> Felix3680_2Test:
>>>>>> 
>>>>>> 
>>>> 
>> test_concurrent_injection_with_bundleContext(org.apache.felix.scr.integration.Felix3680_2Test)
>>>> Time elapsed: 36.597 sec  <<< ERROR!
>>>>>> java.lang.NullPointerException
>>>>>>      at
>>>> 
>> org.apache.felix.scr.impl.manager.DependencyManager.invokeUnbindMethod(DependencyManager.java:1710)
>>>>>>      at
>>>> 
>> org.apache.felix.scr.impl.manager.SingleComponentManager.invokeUnbindMethod(SingleComponentManager.java:387)
>>>>>>      at
>>>> 
>> org.apache.felix.scr.impl.manager.DependencyManager$MultipleDynamicCustomizer.removedService(DependencyManager.java:355)
>>>>>>      at
>>>> 
>> org.apache.felix.scr.impl.manager.DependencyManager$MultipleDynamicCustomizer.removedService(DependencyManager.java:290)
>>>>>>      at
>>>> 
>> org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.customizerRemoved(ServiceTracker.java:1503)
>>>>>>      at
>>>> 
>> org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.customizerRemoved(ServiceTracker.java:1398)
>>>>>>      at
>>>> 
>> org.apache.felix.scr.impl.manager.ServiceTracker$AbstractTracked.untrack(ServiceTracker.java:1258)
>>>>>>      at
>>>> 
>> org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.serviceChanged(ServiceTracker.java:1437)
>>>>>>      at
>>>> 
>> org.apache.felix.framework.util.EventDispatcher.invokeServiceListenerCallback(EventDispatcher.java:932)
>>>>>>      at
>>>> 
>> org.apache.felix.framework.util.EventDispatcher.fireEventImmediately(EventDispatcher.java:793)
>>>>>>      at
>>>> 
>> org.apache.felix.framework.util.EventDispatcher.fireServiceEvent(EventDispatcher.java:543)
>>>>>>      at
>>>> org.apache.felix.framework.Felix.fireServiceEvent(Felix.java:4260)
>>>>>>      at org.apache.felix.framework.Felix.access$000(Felix.java:74)
>>>>>>      at
>>>> org.apache.felix.framework.Felix$1.serviceChanged(Felix.java:390)
>>>>>>      at
>>>> 
>> org.apache.felix.framework.ServiceRegistry.unregisterService(ServiceRegistry.java:148)
>>>>>>      at
>>>> 
>> org.apache.felix.framework.ServiceRegistrationImpl.unregister(ServiceRegistrationImpl.java:127)
>>>>>>      at
>>>> 
>> org.apache.felix.scr.integration.components.felix3680_2.Main$RegistrationHelper$2.run(Main.java:136)
>>>>>>      at
>>>> 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>>      at
>>>> 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>>      at java.lang.Thread.run(Thread.java:722)
>>>>>> 
>>>>>> Are you also getting this exception ?
>>>>>> 
>>>>>> thanks
>>>>>> 
>>>>>> /Pierre
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Sat, Oct 26, 2013 at 6:34 PM, David Jencks <david_jencks@yahoo.com
>>> 
>>>> wrote:
>>>>>> Hi PIerre,
>>>>>> 
>>>>>> Looking at the CA spec it looks like CA is supposed to send out
>>>> CM_LOCATION_CHANGED events even before any properties are set when
>>>> setBundleLocation is called.  I added some code to ignore these events.
>>>> Note that DS is "reserving" the configurations for (one of) the
>>>> component(s) that will be consuming them by calling
>> getConfiguration(pid).
>>>>>> 
>>>>>> I do wonder how the location to something non-null on your
>>>> configurations before the properties are set.
>>>>>> 
>>>>>> Waiting for the next bug :-)
>>>>>> 
>>>>>> thanks
>>>>>> david jencks
>>>>>> 
>>>>>> On Oct 26, 2013, at 3:00 AM, Pierre De Rop <pierre.derop@gmail.com>
>>>> wrote:
>>>>>> 
>>>>>>> Hello David,
>>>>>>> 
>>>>>>> The code we are using to configure our components is old, at
at the
>>>> time we wrote it, configadmin was not supporting multi-location. But I
>> do
>>>> agree, we can now use the "?" multi-location.
>>>>>>> 
>>>>>>> Now, I'm sorry but I'm still seeing another NPE (sometimes, not
>>>> always):
>>>>>>> 
>>>>>>> 2013-10-26 11:45:44,209 CM Event Dispatcher (Fire ConfigurationEvent:
>>>> pid=sipagent) ERROR osgi  - [43] Unexpected problem delivering
>>>> configuration event to [org.osgi.service.cm.ConfigurationListener,
>> id=102,
>>>> 
>> bundle=341/reference:file:/home/nxuser/pp/bundles/custo/org.apache.felix.scr.jar]
>>>>>>> 
>>>>>>> java.lang.NullPointerException
>>>>>>>      at
>>>> 
>> org.apache.felix.scr.impl.manager.ComponentFactoryImpl.getProperties(ComponentFactoryImpl.java:226)
>>>>>>>      at
>>>> 
>> org.apache.felix.scr.impl.manager.ComponentFactoryImpl.configurationUpdated(ComponentFactoryImpl.java:396)
>>>>>>>      at
>>>> 
>> org.apache.felix.scr.impl.config.ConfigurationSupport.configurationEvent(ConfigurationSupport.java:344)
>>>>>>>      at
>>>> 
>> org.apache.felix.cm.impl.ConfigurationManager$FireConfigurationEvent.sendEvent(ConfigurationManager.java:2032)
>>>>>>>      at
>>>> 
>> org.apache.felix.cm.impl.ConfigurationManager$FireConfigurationEvent.run(ConfigurationManager.java:2002)
>>>>>>>      at
>>>> org.apache.felix.cm.impl.UpdateThread.run(UpdateThread.java:103)
>>>>>>>      at java.lang.Thread.run(Thread.java:722)
>>>>>>> 
>>>>>>> 
>>>>>>> I'm not sure, but it seems that ConfigAdmin is providing a null
>>>> dictionary, when delivering a CM_LOCATION_CHANGED event ? if correct,
>> then
>>>> Is this a normal behavior ?
>>>>>>> 
>>>>>>> This is strange; perhaps I shall start a new integration test
?
>>>>>>> 
>>>>>>> /Pierre
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Sat, Oct 26, 2013 at 9:54 AM, David Jencks <
>> david_jencks@yahoo.com>
>>>> wrote:
>>>>>>> Hi Pierre,
>>>>>>> 
>>>>>>> This pointed out a logic error I introduced for Felix 3651. 
I opened
>>>> https://issues.apache.org/jira/browse/FELIX-4293 and fixed the error I
>>>> found which I think explains the NPE.  Could you check this?
>>>>>>> 
>>>>>>> Could I ask what you are trying to do by setting the bundleLocation
>> to
>>>> null?  If you want to allow any bundle to receive the configuration you
>>>> could use multi-location support and set the location to "?"  With the
>> code
>>>> you have now, if the configuration is already in use by a DS component,
>> the
>>>> location changed event will result in the bundle location being reset
>> back
>>>> to what it was.
>>>>>>> 
>>>>>>> thanks!
>>>>>>> david jencks
>>>>>>> On Oct 25, 2013, at 8:32 AM, Pierre De Rop <pierre.derop@gmail.com>
>>>> wrote:
>>>>>>> 
>>>>>>>> Hi David,
>>>>>>>> 
>>>>>>>> thanks; The fix is fixing the problem :-)
>>>>>>>> 
>>>>>>>> but ... there's now a new different problem: i'm now sometimes
>>>> getting this
>>>>>>>> NPE, after SCR is receiving a CM_LOCATION_CHANGED event:
>>>>>>>> 
>>>>>>>> 2013-10-25 16:11:44,674 CM Event Dispatcher (Fire
>> ConfigurationEvent:
>>>>>>>> pid=sipagent) ERROR osgi  - [43] Unexpected problem delivering
>>>>>>>> configuration event to [org.osgi.service.cm.ConfigurationListener,
>>>> id=102,
>>>>>>>> 
>>>> 
>> bundle=341/reference:file:/home/nxuser/pp/bundles/custo/org.apache.felix.scr.jar]
>>>>>>>> 
>>>>>>>> java.lang.NullPointerException
>>>>>>>>    at
>>>>>>>> 
>>>> 
>> org.apache.felix.scr.impl.manager.ComponentFactoryImpl.getProperties(ComponentFactoryImpl.java:226)
>>>>>>>>    at
>>>>>>>> 
>>>> 
>> org.apache.felix.scr.impl.manager.ComponentFactoryImpl.configurationUpdated(ComponentFactoryImpl.java:396)
>>>>>>>>    at
>>>>>>>> 
>>>> 
>> org.apache.felix.scr.impl.config.ConfigurationSupport.configurationEvent(ConfigurationSupport.java:390)
>>>>>>>>    at
>>>>>>>> 
>>>> 
>> org.apache.felix.cm.impl.ConfigurationManager$FireConfigurationEvent.sendEvent(ConfigurationManager.java:2032)
>>>>>>>>    at
>>>>>>>> 
>>>> 
>> org.apache.felix.cm.impl.ConfigurationManager$FireConfigurationEvent.run(ConfigurationManager.java:2002)
>>>>>>>>    at
>>>> org.apache.felix.cm.impl.UpdateThread.run(UpdateThread.java:103)
>>>>>>>>    at java.lang.Thread.run(Thread.java:722)
>>>>>>>> 
>>>>>>>> Perhaps a new jira issue shall be opened ?
>>>>>>>> 
>>>>>>>> I think we are getting a CM_LOCATION_CHANGED event because
in our
>>>>>>>> application, we populate configuration admin by doing something
like
>>>> this:
>>>>>>>> 
>>>>>>>> Configuration cfg = cm.getConfiguration(pid, null)
>>>>>>>> if (config.getBundleLocation() != null) {
>>>>>>>>  config.setBundleLocation(null);
>>>>>>>> }
>>>>>>>> 
>>>>>>>> The setBundleLocation(null) is probably useless, but this
leads to a
>>>>>>>> CM_LOCATION_CHANGED event, which then sometimes ends up with
the
>> NPE.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Friday, October 25, 2013, David Jencks <david_jencks@yahoo.com>
>>>> wrote:
>>>>>>>>> Hi Pierre,
>>>>>>>>> 
>>>>>>>>> You are so good at writing useful tests!!
>>>>>>>>> 
>>>>>>>>> I found a place to call setTargets(getProperties()) from
inside
>>>>>>>> ComponentFactoryImpl that would have fewer side effects.
 Could you
>>>> see if
>>>>>>>> this makes your actual applications work properly?  I'm uploading
a
>>>>>>>> snapshot.
>>>>>>>>> 
>>>>>>>>> many thanks
>>>>>>>>> david jencks
>>>>>>>>> 
>>>>>>>>> On Oct 24, 2013, at 6:17 AM, Pierre De Rop <pierre.derop@gmail.com
>>> 
>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hi David,
>>>>>>>>>> 
>>>>>>>>>> Since this application is complex, I'm not able to
provide logs
>>>> because
>>>>>>>>>> there are hundreds of components involved which are
not mine, and
>>>> for
>>>>>>>> now,
>>>>>>>>>> I'm not able to diagnose the problem.
>>>>>>>>>> 
>>>>>>>>>> But I have created FELIX-4290, and joined to it an
integration
>> test
>>>> which
>>>>>>>>>> seems to reproduce the kind of problem I think I'm
having in my
>>>>>>>>>> application. I also joined the proposed patch.
>>>>>>>>>> 
>>>>>>>>>> I did not have time to test the patch you suggested
regarding the
>>>>>>>>>> SingleComponentManager.reconfigure method, so let's
continue to
>>>>>>>> investigate
>>>>>>>>>> using the jira issue and the test I attached to it.
>>>>>>>>>> 
>>>>>>>>>> Thanks;
>>>>>>>>>> 
>>>>>>>>>> /Pierre
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Thu, Oct 24, 2013 at 12:27 AM, David Jencks <
>>>> david_jencks@yahoo.com
>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi Pierre,
>>>>>>>>>>> 
>>>>>>>>>>> I believe you that this code path doesn't work
:-)
>>>>>>>>>>> 
>>>>>>>>>>> I think there should be a less invasive way to
fix this.  By any
>>>> chance
>>>>>>>>>>> can you get a debug-enabled log from when this
problem occurs?
>> It
>>>> would
>>>>>>>>>>> help confirm my suspicions of what might be missing.
>>>>>>>>>>> 
>>>>>>>>>>> FWIW I suspect SingleComponentManager.reconfigure
is missing a
>>>> check for
>>>>>>>>>>> m_factoryProperties here (line 561):
>>>>>>>>>>> 
>>>>>>>>>>>        // nothing to do if there is no configuration
(see
>>>> FELIX-714)
>>>>>>>>>>>        if ( configuration == null &&
m_configurationProperties
>> ==
>>>>>>>>>>> null )
>>>>>>>>>>>        {
>>>>>>>>>>>            log( LogService.LOG_DEBUG, "No configuration
provided
>>>> (or
>>>>>>>>>>> deleted), nothing to do", null );
>>>>>>>>>>>            return;
>>>>>>>>>>>        }
>>>>>>>>>>> 
>>>>>>>>>>> Unless we can't figure anything out for sure
I'd prefer to fix
>> this
>>>>>>>> before
>>>>>>>>>>> the release.
>>>>>>>>>>> 
>>>>>>>>>>> thanks
>>>>>>>>>>> david jencks
>>>>>>>>>>> 
>>>>>>>>>>> On Oct 23, 2013, at 3:09 PM, Pierre De Rop <
>> pierre.derop@gmail.com
>>>>> 
>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Hi David,
>>>>>>>>>>>> 
>>>>>>>>>>>> (sorry to do all this noise while you are
releasing ...)
>>>>>>>>>>>> 
>>>>>>>>>>>> We are indeed using factory components; and
today, I finally
>>>> found and
>>>>>>>>>>>> fixed a cycle, using the Apache Service Diagnostic
tool; and I'm
>>>> going
>>>>>>>>>>>> further on but now I'm facing another problem
which I did not
>>>> have in
>>>>>>>> the
>>>>>>>>>>>> scr 1.6.2.
>>>>>>>>>>>> 
>>>>>>>>>>>> So, I would like to discuss about this new
problem with you
>>>> before you
>>>>>>>>>>> redo
>>>>>>>>>>>> a release, in order to decide if this problem
(if there is
>> really
>>>> one
>>>>>>>> ?)
>>>>>>>>>>>> shall be addressed now or after the upcoming
release ?
>>>>>>>>>>>> 
>>>>>>>>>>>> So, in our application, we are extensively
using factory
>>>> components
>>>>>>>>>>>> (@Component(factory=XXX")).
>>>>>>>>>>>> When we instantiate a factory component (using
>>>>>>>>>>>> ComponentFactory.newInstance()), We pass
to the newInstance()
>>>> method
>>>>>>>> some
>>>>>>>>>>>> additional component properties which may
also contain some
>> target
>>>>>>>>>>> filters.
>>>>>>>>>>>> 
>>>>>>>>>>>> This allows to dynamically configure the
filter of some
>> References
>>>>>>>>>>> declared
>>>>>>>>>>>> in the factory component.
>>>>>>>>>>>> in the scr 1.6.2, this mechanism was working
fine. But using
>>>> trunk,
>>>>>>>> this
>>>>>>>>>>>> does not work all the time. Some target filters
seem to be
>>>> correctly
>>>>>>>>>>>> configured, and some others are not (I'm
not sure, actually,
>> it's
>>>> late
>>>>>>>>>>> ...).
>>>>>>>>>>>> 
>>>>>>>>>>>> So, it looks like sometimes, some target
filters are not updated
>>>> before
>>>>>>>>>>>> activating components ? or factory components
?
>>>>>>>>>>>> 
>>>>>>>>>>>> I'm not sure but this might be related to
the old FELIX-3726.
>>>>>>>>>>>> Now, interestingly, I did the following patch
and my application
>>>> is now
>>>>>>>>>>>> working fine: In the  AbstractComponentManager
class, I
>>>> systematically
>>>>>>>>>>>> update target filters, like this:
>>>>>>>>>>>> 
>>>>>>>>>>>> +++
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>> 
>>>> 
>> src/main/java/org/apache/felix/scr/impl/manager/AbstractComponentManager.java
>>>>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>> 
>> 


Mime
View raw message