karaf-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kurt Westerfeld (JIRA)" <j...@apache.org>
Subject [jira] [Created] (KARAF-6224) Race condition in BaseActivator on first launch
Date Tue, 02 Apr 2019 17:33:02 GMT
Kurt Westerfeld created KARAF-6224:
--------------------------------------

             Summary: Race condition in BaseActivator on first launch
                 Key: KARAF-6224
                 URL: https://issues.apache.org/jira/browse/KARAF-6224
             Project: Karaf
          Issue Type: Bug
          Components: karaf
    Affects Versions: 4.2.4, 4.1.7, 4.0.10
            Reporter: Kurt Westerfeld


We have several karaf containers we run on single machine that contains a large number of
cores (20).  The machine core count is high so this may be a hard problem to reproduce. 
We have customized the RMI and JMX ports for each of the containers so that they do not conflict. 
However, after the first karaf VM is launched and claims ports 1099/44444, the second VM will
attempt to do the same briefly before its customized configuration can be read from the ${karaf.etc}
directory.   You can see that the management bundle gets started and then a configuration
update will happen immediately with the corrected values.

In looking over BaseActivator, it seems that a thread is created to dispatch the initialization
and sometimes this thread will encounter a null field "config" before the asynchronous managed
service event arrives.  In this case, the configuration is missing and defaults will be used. 
Because of this, ports 1099 and 44444 are temporarily attempted to be used until the first
managed service event arrives with the updated() method.   Immediately after that, the service
reconfigures and uses the proper customized values.

This is a problem for us because at times this temporary event can cause a client to mistakenly
connect to the wrong container.  We use JMX over RMI to perform a number of management operations
and this initial startup is unreliable.  Our three karaf containers have some interdependencies
that this temporary condition is causing problems with.

This problem does not occur as often on subsequent restarts, which means that initial provisioning
of the ${karaf.etc} must be racing here.  We have seen it happen, however, although rarer,
at any time.  It is believed that the high core count of the server this happens to be running
on results in the race condition.

Suggested fix is to make a call to config admin at run() to read the configuration if this.config
is null.  This would handle the race here but it could cause other bad interactions with
config admin?  Not sure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message