karaf-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ioannis Canellos (JIRA)" <j...@apache.org>
Subject [jira] [Created] (KARAF-852) Cellar node registrations can be lost for groups and distributed service endpoints.
Date Tue, 06 Sep 2011 08:30:10 GMT
Cellar node registrations can be lost for groups and distributed service endpoints.

                 Key: KARAF-852
                 URL: https://issues.apache.org/jira/browse/KARAF-852
             Project: Karaf
          Issue Type: Bug
          Components: cellar-core
    Affects Versions: cellar-2.2.1, cellar-2.2.2
            Reporter: Ioannis Canellos
             Fix For: cellar-3.0.0, cellar-2.2.3

Groups are stored in a distributed collection. Each group object keeps internally a set of
members (the nodes that are registered to the group). If a group is registered simultaneously
by two or more nodes, the group object will be overwritten. The result will be that the members
that were registered in the object that was overwritten will be lost.

Exactly the same issue can occur with nodes registering for remote services.

Please note, that this is also a problem when two clusters are getting merged.

This issue will probably never trigger when instances are started manually or on relatively
small size clusters, but its a problem in cloud deployments of 10+ nodes.

One solution would be to use a distributed lock before accessing these collections, but this
won't solve the case of the clusters that merge.
The alternative solution would be to refactor cellar and keep node registrations in separate
collections. So instead of having a collection of groups and keep nodes inside the Group object,
we could have a collection of group and a multimap of nodes (the key will be the the name
and the value will be the nodes). We will need to check first how hazelcast multi map merge
is handled by Hazelcast. If upon muti map merges the value objects are overriden instead of
appended we will have to work this out the other way around. By other way around I mean having
the groups & dist. services as part of the node.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message