stratos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michiel Blokzijl (mblokzij)" <mblok...@cisco.com>
Subject Re: Topology inconsistent
Date Wed, 22 Apr 2015 13:58:09 GMT
Hi Imesh,

> Ideally cartridge agent should only wait for Complete Topology event once in its lifecycle.
If it is waiting more than once then there is an issue.


That’s not the problem, it only waits once for the complete topology.

To me it looks like the topology is never updated, or if it is, then it’s not clear to me
how that’s happening? It looks like the Python cartridge agent for example does call an
‘update’ method:
https://github.com/apache/stratos/blob/22fdf78be8a62312a65b23e017f0de20cfad82b2/components/org.apache.stratos.python.cartridge.agent/src/main/python/cartridge.agent/cartridge.agent/agent.py#L250
<https://github.com/apache/stratos/blob/22fdf78be8a62312a65b23e017f0de20cfad82b2/components/org.apache.stratos.python.cartridge.agent/src/main/python/cartridge.agent/cartridge.agent/agent.py#L250>

- I don’t see anything similar in the Java cartridge agent.

Please could someone confirm whether this is the case, and perhaps explain how updating the
topology is supposed to work in the Java cartridge agent?

Best regards,

Michiel

On 20 Apr 2015, at 19:05, Imesh Gunaratne <imesh@apache.org> wrote:

> Hi Michiel,
> 
> It's a pleasure! My guess is that either cartridge agent has been restarted or there
is a bug in its logic.
> 
> Ideally cartridge agent should only wait for Complete Topology event once in its lifecycle.
If it is waiting more than once then there is an issue.
> 
> Thanks
> 
> On Mon, Apr 20, 2015 at 11:06 PM, Michiel Blokzijl (mblokzij) <mblokzij@cisco.com
<mailto:mblokzij@cisco.com>> wrote:
> HI Imesh,
> 
> Thanks for replying,
> 
>> This issue might occur if the cartridge agent start processing member events before
consuming Complete Topology event.
> 
> 
> The issue happened way after that, I had Stratos running for a day or so, and in the
logs I saw some “waiting for complete topology event ..” but they went away pretty quickly
(way before this happened).
> 
> Is this the code that’s supposed to do the updates? https://github.com/apache/stratos/blob/4.0.0/components/org.apache.stratos.cartridge.agent/src/main/java/org/apache/stratos/cartridge/agent/extensions/DefaultExtensionHandler.java#L328
<https://github.com/apache/stratos/blob/4.0.0/components/org.apache.stratos.cartridge.agent/src/main/java/org/apache/stratos/cartridge/agent/extensions/DefaultExtensionHandler.java#L328>
> 
> Because I don’t see anything that actually updates anything (beyond function-local
variables like ‘env')..
> 
> Michiel
> 
> On 20 Apr 2015, at 18:13, Imesh Gunaratne <imesh@apache.org <mailto:imesh@apache.org>>
wrote:
> 
>> Hi Michiel,
>> 
>> This issue might occur if the cartridge agent start processing member events before
consuming Complete Topology event.
>> 
>> This is how the topology get initialized in any component that listen to topology
topic in message broker; First of all when the component starts up it waits for the Complete
Topology event to receive. This event is periodically published by Cloud Controller with the
entire topology of a given moment of time.
>> 
>> Once it is received the component would initialize the local topology and start listening
to other events. Since Complete Topology event has given the latest state of the topology
now the component can consume any other event published afterwards.
>> 
>> Thanks
>> 
>> 
>> 
>> On Mon, Apr 20, 2015 at 7:44 PM, Michiel Blokzijl (mblokzij) <mblokzij@cisco.com
<mailto:mblokzij@cisco.com>> wrote:
>> Hi,
>> I’m looking at an issue with Stratos 4.0.0 code, and I’m having an issue with
the cartridge agent. It complains about the topology being inconsistent, triggered by this
code [1].
>> 
>> This causes the extension handler not to fire for cartridges going down.
>> 
>> [2015-04-19 07:19:22,486]  INFO - [MemberTerminatedMessageProcessor] Member terminated:
[service] XXX [cluster] XXX [member] XXX-0.dom2a4618d5-edd9-4a99-9d9c-918715c761bd
>> [2015-04-19 07:19:22,486]  INFO - [DefaultExtensionHandler] Member terminated event
received: [service] XXX [cluster] XX [member] XXX-0.dom2a4618d5-edd9-4a99-9d9c-918715c761bd
>> [2015-04-19 07:19:22,486] ERROR - [ExtensionUtils] Member id not found in topology
[member] XXXX.dom2a4618d5-edd9-4a99-9d9c-918715c761bd
>> [2015-04-19 07:19:22,486] ERROR - [DefaultExtensionHandler] Topology is inconsistent...failed
to execute member terminated event
>> 
>> Any idea what’s going wrong here?
>> 
>> I assume the topology isn’t being maintained correctly for some reason, but I haven’t
quite figured out how/if the topology is being maintained at all. Looking at the complete
topology event handler [2] for example, it doesn’t actually update the internally stored
topology.. There’s nothing in the cartridge agent that calls the topology manager’s acquireWriteLock
function..
>> 
>> Best regards,
>> 
>> Michiel
>> 
>> [1] https://github.com/apache/stratos/blob/4.0.0/components/org.apache.stratos.cartridge.agent/src/main/java/org/apache/stratos/cartridge/agent/extensions/DefaultExtensionHandler.java#L374
<https://github.com/apache/stratos/blob/4.0.0/components/org.apache.stratos.cartridge.agent/src/main/java/org/apache/stratos/cartridge/agent/extensions/DefaultExtensionHandler.java#L374>
>> 
>> [2] https://github.com/apache/stratos/blob/4.0.0/components/org.apache.stratos.cartridge.agent/src/main/java/org/apache/stratos/cartridge/agent/extensions/DefaultExtensionHandler.java#L328
<https://github.com/apache/stratos/blob/4.0.0/components/org.apache.stratos.cartridge.agent/src/main/java/org/apache/stratos/cartridge/agent/extensions/DefaultExtensionHandler.java#L328>
>> 
>> 
>> --
>> Imesh Gunaratne
>> 
>> Technical Lead, WSO2
>> Committer & PMC Member, Apache Stratos
> 
> 
> 
> 
> --
> Imesh Gunaratne
> 
> Technical Lead, WSO2
> Committer & PMC Member, Apache Stratos


Mime
View raw message