karaf-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Baptiste Onofré (JIRA) <j...@apache.org>
Subject [jira] [Assigned] (KARAF-6224) Race condition in BaseActivator on first launch
Date Wed, 17 Apr 2019 13:22:00 GMT

     [ https://issues.apache.org/jira/browse/KARAF-6224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Jean-Baptiste Onofré reassigned KARAF-6224:

    Assignee: Jean-Baptiste Onofré

> Race condition in BaseActivator on first launch
> -----------------------------------------------
>                 Key: KARAF-6224
>                 URL: https://issues.apache.org/jira/browse/KARAF-6224
>             Project: Karaf
>          Issue Type: Bug
>          Components: karaf
>    Affects Versions: 4.0.10, 4.1.7, 4.2.4
>            Reporter: Kurt Westerfeld
>            Assignee: Jean-Baptiste Onofré
>            Priority: Critical
> We have several karaf containers we run on single machine that contains a large number
of cores (20).  The machine core count is high so this may be a hard problem to reproduce. 
We have customized the RMI and JMX ports for each of the containers so that they do not conflict. 
However, after the first karaf VM is launched and claims ports 1099/44444, the second VM will
attempt to do the same briefly before its customized configuration can be read from the ${karaf.etc}
directory.   You can see that the management bundle gets started and then a configuration
update will happen immediately with the corrected values.
> In looking over BaseActivator, it seems that a thread is created to dispatch the initialization
and sometimes this thread will encounter a null field "config" before the asynchronous managed
service event arrives.  In this case, the configuration is missing and defaults will be used. 
Because of this, ports 1099 and 44444 are temporarily attempted to be used until the first
managed service event arrives with the updated() method.   Immediately after that, the service
reconfigures and uses the proper customized values.
> This is a problem for us because at times this temporary event can cause a client to
mistakenly connect to the wrong container.  We use JMX over RMI to perform a number of management
operations and this initial startup is unreliable.  Our three karaf containers have some
interdependencies that this temporary condition is causing problems with.
> This problem does not occur as often on subsequent restarts, which means that initial
provisioning of the ${karaf.etc} must be racing here.  We have seen it happen, however, although
rarer, at any time.  It is believed that the high core count of the server this happens to
be running on results in the race condition.
> Suggested fix is to make a call to config admin at run() to read the configuration if
this.config is null.  This would handle the race here but it could cause other bad interactions
with config admin?  Not sure.

This message was sent by Atlassian JIRA

View raw message