stratos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reka Thirunavukkarasu <r...@wso2.com>
Subject Re: Testing stratos 4.1: termination behavior / STRATOS-1353
Date Wed, 06 May 2015 12:26:23 GMT
Hi Martin,

I have fixed the issue with the logic. I have verified it in my setup and
it is working fine..I have pushed the changes to master now in
1ef37dbb2a23583ec406e5fa928af7fabb979b8d. I have also pushed the monitor
startup issue fix as well. Please let me know, if you identify any other
issues with it..

Thanks,
Reka


On Wed, May 6, 2015 at 10:15 AM, Reka Thirunavukkarasu <reka@wso2.com>
wrote:

> Thanks Martin for the info..I will have a look and update on this further..
>
> Thanks,
> Reka
>
> On Wed, May 6, 2015 at 10:11 AM, Martin Eppel (meppel) <meppel@cisco.com>
> wrote:
>
>>  Hi Reka,
>>
>>
>>
>> I re-run one of the scenarios from below with re-start failing after
>> terminating an instance.
>>
>> See screen shot [1.] and debug enabled log / json files attached
>> (terminate-dependents-restart-fails.zip).
>>
>>
>>
>> The top level group defines terminate-dependents (c3, c2, c1), the bottom
>> group (c4, c5) defines terminate-all.
>>
>>
>>
>> Scenario:
>>
>> 1.      manually terminated c3 ->  c2, c1 are correctly terminated by
>> stratos (terminate-dependents)
>>
>> 2.      expected to see stratos to re-start c3 -> c2 -> c1 but none (c3,
>> c2, c1) were restarted.
>>
>>
>>
>> Thanks
>>
>>
>>
>> Martin
>>
>>
>>
>> [1.]
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
>> *Sent:* Tuesday, May 05, 2015 6:32 AM
>>
>> *To:* dev
>> *Subject:* Re: Testing stratos 4.1: termination behavior / STRATOS-1353
>>
>>
>>
>> Hi Martin,
>>
>> As i tested, termination-none was working fine in my setup. If one of the
>> member from the group which has termination-none behavior got terminated,
>> that another replacement member came up without any issue while other
>> siblings are there with the existing status. I could see below lines from
>> the logs that you attached.
>>
>> TID: [0] [STRATOS] [2015-05-01 02:08:36,762]  INFO
>> {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher}
>> -  Publishing member terminated event: [service-name] c1 [cluster-id]
>> s-n-gr-s-G123-t-a-1-Id.c1-0x0.c1.domain [cluster-instance-id]
>> s-n-gr-s-G123-t-a-1-Id-1 [member-id]
>> s-n-gr-s-G123-t-a-1-Id.c1-0x0.c1.domaind283a45e-6500-411d-94e5-48ff19c5a539
>> [network-partition-id] RegionOne [partition-id] whole-region [group-id] null
>> TID: [0] [STRATOS] [2015-05-01 02:08:36,775]  WARN
>> {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -  Obsolete
>> member has either been terminated or its obsolete time out has expired and
>> it is removed from obsolete members list:
>> s-n-gr-s-G123-t-a-1-Id.c1-0x0.c1.domaind283a45e-6500-411d-94e5-48ff19c5a539
>> TID: [0] [STRATOS] [2015-05-01 02:08:36,775]  INFO
>> {org.apache.stratos.autoscaler.status.processor.cluster.ClusterStatusTerminatedProcessor}
>> -  Cluster has non terminated [members] and in the [status] Active
>> TID: [0] [STRATOS] [2015-05-01 02:08:36,775]  INFO
>> {org.apache.stratos.messaging.message.processor.topology.MemberTerminatedMessageProcessor}
>> -  Member terminated: [service] c1 [cluster]
>> s-n-gr-s-G123-t-a-1-Id.c1-0x0.c1.domain [member]
>> s-n-gr-s-G123-t-a-1-Id.c1-0x0.c1.domaind283a45e-6500-411d-94e5-48ff19c5a539
>> TID: [0] [STRATOS] [2015-05-01 02:08:47,242]  INFO
>> {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -  Executing
>> scaling rule as statistics have been reset
>> TID: [0] [STRATOS] [2015-05-01 02:09:36,220]  INFO
>> {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -  Executing
>> scaling rule as statistics have been reset
>> TID: [0] [STRATOS] [2015-05-01 02:10:17,242]  INFO
>> {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -  Executing
>> scaling rule as statistics have been reset
>> TID: [0] [STRATOS] [2015-05-01 02:11:06,220]  INFO
>> {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -  Executing
>> scaling rule as statistics have been reset
>> TID: [0] [STRATOS] [2015-05-01 02:11:47,243]  INFO
>> {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -  Executing
>> scaling rule as statistics have been reset
>>
>> Since Cluster has non terminated members, the status was as Active.
>> That's the reason why there is no new member coming up.  I'm unable to find
>> out why the terminated member didn't get removed from the cluster. Are you
>> able to reproduce the issue of this member not coming up for
>> termination-none continuously? Can you share the logs with debug enabled
>> for this case as well?
>>
>> I also faced an NPE while testing other usecase. I will continue work on
>> it and update.
>>
>> Thanks,
>>
>> Reka
>>
>>
>>
>> On Sat, May 2, 2015 at 2:53 AM, Martin Eppel (meppel) <meppel@cisco.com>
>> wrote:
>>
>> Hi Reka,
>>
>>
>>
>> I re-run the scenario with debug logs turned on, dead lock detection,
>> please find attached artifacts, logs [scenario_term_dependents.zip] – for
>> application see screenshot, artifacts [1.]. If a terminate a member (c3)
>> which has as termination behavior “terminate-dependents” all dependents are
>> properly terminated but not restarted .
>>
>>
>>
>> Startup sequence is : c5 -> c4 -> G2 -> c3 -> c2 –> c1
>> Scenario : after all instances are active, terminate c3
>>
>> Expected:
>>
>> termination of c2, c1 : ok
>>
>> restart of c3 -> c2 -> c1: fail
>>
>>
>>
>> Thanks
>>
>>
>>
>> Martin
>>
>>
>>
>> [1.]
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
>> *Sent:* Friday, May 01, 2015 12:06 AM
>> *To:* dev
>> *Subject:* Re: Testing stratos 4.1: termination behavior / STRATOS-1353
>>
>>
>>
>> Hi Martin,
>>
>> Sorry that i couldn't get a chance to verify the scenarios that you
>> mentioned with the code. I will verify all other usecases for termination
>> behavior and update with you on the progress..
>>
>> Thanks,
>>
>> Reka
>>
>>
>>
>> On Fri, May 1, 2015 at 8:02 AM, Martin Eppel (meppel) <meppel@cisco.com>
>> wrote:
>>
>> Hi Reka,
>>
>>
>>
>> I merged in your fix for termination behavior and it fixes a couple of
>> issues (terminate-all). However, I encountered a couple of other potential
>> issues:
>>
>>
>>
>> 1.      In the 2 level nested grouping scenario (as defined in
>> STRATOS-1353) the terminate-none seems to be incorrect (see application
>> structure :
>> When I terminated the instance in G1 the instance c1 is terminated – but
>> no new instance is started.
>> See screen shot [1.], [2.] and artifacts attached (artifacts_scen_1)
>> After terminating c3 in G3 (bottom group, which has terminate-all
>> defined) c3 and c2 start up again but c1 still doesn’t come up (see log
>> wso2carbon-full.log)
>>
>> 2.      In the 1 level grouping scenario (different application as in
>> scen_1, see screen shot [4.]) I terminate c3 in group G1. Group G1 defines
>> as termination behavior “terminate-dependents” and c1 depends on c2 and c2
>> depends on c3.
>> When I kill c1, c2 is correctly terminated but no restarted (*instance
>> terminated after: TID: [0] [STRATOS] [2015-05-01 01:29:22,139]*)  .
>> Instead I noticed exceptions in the log file (see attached log file,
>> below[3.])
>> Artifacts are attached (artifacts_scen_2)
>>
>>
>>
>> Regards
>>
>>
>>
>> Martin
>>
>>
>>
>>
>>
>> [1.] Screenshot scenario 1
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> [2.] After terminating c1 :
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> [3.] Exceptions
>>
>>
>>
>> *java.lang.NullPointerException*
>>
>> *        at
>> org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor.allParentActive(ParentComponentMonitor.java:536)*
>>
>> *        at
>> org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor.onChildTerminatedEvent(ParentComponentMonitor.java:429)*
>>
>> *        at
>> org.apache.stratos.autoscaler.monitor.component.GroupMonitor.onTerminationOfInstance(GroupMonitor.java:459)*
>>
>> *        at
>> org.apache.stratos.autoscaler.monitor.component.GroupMonitor.onChildStatusEvent(GroupMonitor.java:435)*
>>
>> *        at
>> org.apache.stratos.autoscaler.monitor.events.builder.MonitorStatusEventBuilder.notifyParent(MonitorStatusEventBuilder.java:86)*
>>
>> *        at
>> org.apache.stratos.autoscaler.monitor.events.builder.MonitorStatusEventBuilder.handleClusterStatusEvent(MonitorStatusEventBuilder.java:40)*
>>
>> *        at
>> org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor.notifyParentMonitor(ClusterMonitor.java:221)*
>>
>> *        at
>> org.apache.stratos.autoscaler.event.receiver.topology.AutoscalerTopologyEventReceiver$8.onEvent(AutoscalerTopologyEventReceiver.java:317)*
>>
>> *        at
>> org.apache.stratos.messaging.listener.EventListener.update(EventListener.java:42)*
>>
>> *        at java.util.Observable.notifyObservers(Observable.java:159)*
>>
>> *        at
>> org.apache.stratos.messaging.event.EventObservable.notifyEventListeners(EventObservable.java:51)*
>>
>> *        at
>> org.apache.stratos.messaging.message.processor.topology.ClusterInstanceTerminatedProcessor.doProcess(ClusterInstanceTerminatedProcessor.java:132)*
>>
>> *        at
>> org.apache.stratos.messaging.message.processor.topology.ClusterInstanceTerminatedProcessor.process(ClusterInstanceTerminatedProcessor.java:64)*
>>
>> *        at
>> org.apache.stratos.messaging.message.processor.topology.ClusterRemovedMessageProcessor.process(ClusterRemovedMessageProcessor.java:65)*
>>
>> *        at
>> org.apache.stratos.messaging.message.processor.topology.ClusterInstanceInactivateProcessor.process(ClusterInstanceInactivateProcessor.java:73)*
>>
>> *        at
>> org.apache.stratos.messaging.message.processor.topology.ClusterInstanceActivatedProcessor.process(ClusterInstanceActivatedProcessor.java:73)*
>>
>> *        at
>> org.apache.stratos.messaging.message.processor.topology.ClusterCreatedMessageProcessor.process(ClusterCreatedMessageProcessor.java:67)*
>>
>> *        at
>> org.apache.stratos.messaging.message.processor.topology.ApplicationClustersRemovedMessageProcessor.process(ApplicationClustersRemovedMessageProcessor.java:63)*
>>
>> *        at
>> org.apache.stratos.messaging.message.processor.topology.ApplicationClustersCreatedMessageProcessor.process(ApplicationClustersCreatedMessageProcessor.java:65)*
>>
>> *        at
>> org.apache.stratos.messaging.message.processor.topology.ServiceRemovedMessageProcessor.process(ServiceRemovedMessageProcessor.java:64)*
>>
>> *        at
>> org.apache.stratos.messaging.message.processor.topology.ServiceCreatedMessageProcessor.process(ServiceCreatedMessageProcessor.java:65)*
>>
>> *        at
>> org.apache.stratos.messaging.message.processor.topology.CompleteTopologyMessageProcessor.process(CompleteTopologyMessageProcessor.java:74)*
>>
>> *        at
>> org.apache.stratos.messaging.message.processor.MessageProcessorChain.process(MessageProcessorChain.java:61)*
>>
>> *        at
>> org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator.run(TopologyEventMessageDelegator.java:73)*
>>
>> *        at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)*
>>
>> *        at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)*
>>
>> *        at java.lang.Thread.run(Thread.java:745)*
>>
>>
>>
>>
>>
>> [4.] Screen shot application
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> Reka Thirunavukkarasu
>> Senior Software Engineer,
>> WSO2, Inc.:http://wso2.com,
>>
>> Mobile: +94776442007
>>
>>
>>
>>
>>
>>
>> --
>>
>> Reka Thirunavukkarasu
>> Senior Software Engineer,
>> WSO2, Inc.:http://wso2.com,
>>
>> Mobile: +94776442007
>>
>>
>>
>
>
>
> --
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
> Mobile: +94776442007
>
>
>


-- 
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007

Mime
View raw message