Hi Lasindu, Reka

 

After incorporating the latest fixes (which solves some part of the “application removal issues”, Thanks !) howevever I still have an issue with application removal in the case of a group scaling scenario. Please see the application structure [1a. …] and the attached zip file with the artifacts (wso2carbon.log, cartridge-group.json, application.json).

 

Please also note, I updated the application accordingly and removed the cartridge deployment policy where a group deployment policy was defined, please review the application.json for correctness.

 

Observation: After all VMs go active (including the ones spun up by group scaling) and the “application removal process” is started the application fails to be removed. The majority of the VMs are terminated but there are 1 or 2 Vms which seem to get started when the application removed process is imitated. These VMs typically get stuck in the “Started” or “Initialized” state and are not being terminated.

Please note the group scaling (1.a … 1.c) progression, what I noted is that the groups, which were created by the autoscaler don’ seem to go into active state, see also attached log “wso2carbon-all-active.log”

 

After starting application removal process, see [1d.], logs “wso2carbon-after-remove.log”, “wso2carbon-after-remove-1.log” and final log “wso2carbon-after-remove-2.log”

 

Thanks

 

Martin

 

 

[1a.]

 

 

[1b.]

 

 

[1c.] all cartridges active, but it seems “auto scaled “ groups are not in active state

 

 

[1d.]

 

 

 

 

[1e.] after application removal process started, instance (stuck) in “Starting” state, see log wso2carbon-after-remove-1.log

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Thanks

 

Martin

 

 

 

 

 

 

 

From: Martin Eppel (meppel)
Sent: Friday, June 12, 2015 10:16 PM
To: dev
Cc: 'Ryan Du Plessis (rdupless)'
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

 

Hi Lasindru,

 

I have run some tests and the issue, failure to remove an application when an instance is terminated and restarted seems to be resolved.

 

However, I do seem to see some issue with group scaling and application removal, but still have to run some tests next week to get a better understanding (not sure yet if this is an issue or not), will keep you posted,

 

Thanks

 

Martin

 

From: Lasindu Charith [mailto:lasindu@wso2.com]
Sent: Friday, June 12, 2015 9:41 AM
To: dev
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

 

Hi Martin,

 

I have fixed above issue in commit 03de83172309c2932075fb5284c120ca610bbf0a. Please take a pull from the master and try-out your scenario again to see if undeployment/redeployment works as expected.

 

Thanks,

 

 

On Thu, Jun 11, 2015 at 11:52 PM, Lasindu Charith <lasindu@wso2.com> wrote:

Hi Martin,

 

I guess my previous observation is incorrect. The root cause for the above issue is because ClusterStatusTerminatedProcessor does not send ClusterTerminatedEvent for all 3 clusters. It only sends 1 and fails to send other 2 clusterTerminated events. This is because, when application is activated again ClusterLevelPartitionContext is added twice to the clusterInstanceContext. This makes the if condition failed at [1] when trying to find out whether cluster monitor has any non terminated members at ClusterStatusTerminatedProcessor before sending clusterTerminated event. Will try to find a proper solution and update the thread.

 

 

 

Thanks,

 

 

On Thu, Jun 11, 2015 at 10:29 PM, Martin Eppel (meppel) <meppel@cisco.com> wrote:

Is there any conclusion how to this fix this ?

 

Thanks

 

 

Martin

 

From: Lahiru Sandaruwan [mailto:lahirus@wso2.com]
Sent: Wednesday, June 10, 2015 6:55 PM
To: dev
Cc: Reka Thirunavukkarasu


Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

 

Hi Imesh,

 

Following could be the possible reason for not un-deploying when member was auto healed,

 

  • The particular cluster, that the member is auto healed, is terminated before others(when others are terminating state)

or

  • The particular cluster, that the member is auto healed, is still terminating when others are terminated state

One of those two cases could happen, even if the member was not auto healed(In case of groups, where one group is very complex, and others are simple). Because, currently we check whether all the cluster and groups in terminating status in the case of the parent group is terminating, which is wrong.

 

Thanks.

 

On Thu, Jun 11, 2015 at 5:49 AM, Imesh Gunaratne <imesh@apache.org> wrote:

Do we know why this only happens if a member was forcefully terminated and auto-healed?

 

On Wed, Jun 10, 2015 at 10:01 PM, Lasindu Charith <lasindu@wso2.com> wrote:

Hi  all,

 

Cause for above issue seems to be as follows. 

GroupStatusTerminatedProcessor and GroupStatusTerminatedProcessor processes the event only if all the group instances and cluster instances are in terminated state or in terminating state consequently[1][2]. But there can be situations(such as above), where some group instances are at terminated state and some at terminating state by the time GroupStatusProcessorChain is executed. For similar scenarios, both GroupStatusTerminatedProcessor and GroupStatusTerminatedProcessor executions are skipped and at GroupStatusInactiveProcessor it prints" No possible state change found" warning.

 

I think we need to find a way to properly fix this. 

 

 

On Tue, Jun 9, 2015 at 8:09 PM, Lasindu Charith <lasindu@wso2.com> wrote:

Hi Martin,

 

I was able to reproduce this issue in the latest build with PCA in Openstack. Even after stratos is restarted, the Application is not undeployed, which makes it impossible to undeploy the application (even the forceful undeployment failed for the above obsolete application).

 

Currently I'm looking at possible causes for this and will update with the progress.

 

Thanks,

 

On Tue, Jun 9, 2015 at 5:59 AM, Martin Eppel (meppel) <meppel@cisco.com> wrote:

Here is another example where the removal fails:

 

For application see [1.], log file (with debug enabled) and jsons are attached.

 

Scenario:

 

·        Deploy application and wait for all cartridges to become active

·        Kill a VM (2nd in startup sequence)

·        Wait for it to restart and become active

·        Un-deploy application

a.      Un-deploy forcefully will succeed
([2015-06-08 20:38:21,487]  INFO {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Forcefully un-deploying the application s-g-c1-c2-c3-s)
und

b.      Un-deploy gracefully will fail to remove app completely (although VMs are terminated successfully)
([2015-06-08 20:54:16,372]  INFO {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Starting to undeploy application: [application-id])

·        Both scenarios are recorded in the same log file wso2carbon-s-g-c1-c2-c3-s.log

·        Btw, I retested the scenario and the issue is easily  reproducible following the steps listed above:
graceful application un-deploy succeeds if no VM had been restarted (terminated and restarted by autoscaler).
Once a VM is terminated , graceful application un-deploy will fail
I attached a log file which demonstrates this case (wso2carbon-s-g-c1-c2-c3-s-scen-2.log). In this scenario, the same application is deployed, becomes active and is then removed (repetead 2 times), then, a VM is terminated and restarted by autoscaler. Afterwards, graceful application un-deploy fails.

 

 

Other Observations:

 

When the application successfully some events e.g. “cluster removed event”, “Application deleted event received:” are being published (see [2.] while when the application fails to be removed no such event is being observed.

 

[2.] cluster removed event when application is un-deployed forcefully

TID: [0] [STRATOS] [2015-06-08 20:38:34,187]  INFO {org.apache.stratos.cloud.controller.messaging.receiver.application.ApplicationEventReceiver} -  Application deleted event received: [application-id] s-g-c1-c2-c3-s

TID: [0] [STRATOS] [2015-06-08 20:38:34,220]  INFO {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher} -  Publishing application clusters removed event: [application-id] s-g-c1-c2-c3-s

 

 

I analyzed the differences in the successful application removal and unsuccessful log sequence and noticed a difference (see also highlighted areas):

 

Successful (see logs in wso2carbon-s-g-c1-c2-c3-s-scen-2.log)

 

TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor} -  StatusChecker calculating the terminated status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [ s-g-c1-c2-c3-s-1 ]

TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock acquired

TID: [0] [STRATOS] [2015-06-08 22:18:41,527]  INFO {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor} -  Sending group instance terminated for [group] s-g-c1-c2-c3-s-x0x [instance] s-g-c1-c2-c3-s-1

TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG {org.apache.stratos.autoscaler.applications.topic.ApplicationBuilder} -  Handling group terminated event: [group-id] s-g-c1-c2-c3-s-x0x [application-id] s-g-c1-c2-c3-s [instance] s-g-c1-c2-c3-s-1

TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -  ClusterMonitor Drools session has been disposed. ClusterMonitor [clusterId=s-g-c1-c2-c3-s.c3-0x0.c3.domain, hasPrimary=false ]

TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -  ClusterMonitor Drools session has been disposed. ClusterMonitor [clusterId=s-g-c1-c2-c3-s.c1-0x0.c1.domain, hasPrimary=false ]

TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -  ClusterMonitor Drools session has been disposed. ClusterMonitor [clusterId=s-g-c1-c2-c3-s.c2-0x0.c2.domain, hasPrimary=false ]

TID: [0] [STRATOS] [2015-06-08 22:18:41,529] DEBUG {org.apache.stratos.cloud.controller.messaging.topology.TopologyManager} -  Write lock released

TID: [0] [STRATOS] [2015-06-08 22:18:41,536] DEBUG {org.apache.stratos.autoscaler.registry.RegistryManager} -  Application [ s-g-c1-c2-c3-s ] persisted successfully in the Autoscaler Registry

TID: [0] [STRATOS] [2015-06-08 22:18:41,538] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Applications updated: {"applicationMap":{"s-g-c1-c2-c3-s":{"id":"s-g-c1-c2-c3-s","key":"l8V7OpRqOfBcWbBw","tenantId":-1234,"tenantDomain":"carbon.super","tenantAdminUserName":"admin","applicationPolicyId":"default-iaas","aliasToGroupMap":{"s-g-c1-c2-c3-s-x0x":{"name":"s-g-c1-c2-c3-s","alias":"s-g-c1-c2-c3-s-x0x","groupMinInstances":1,"groupMaxInstances":1,"applicationId":"s-g-c1-c2-c3-s","aliasToGroupMap":{},"aliasToClusterDataMap":{"c2-0x0":{"serviceType":"c2","clusterId":"s-g-c1-c2-c3-s.c2-0x0.c2.domain","minInstances":1,"maxInstances":1},"c3-0x0":{"serviceType":"c3","clusterId":"s-g-c1-c2-c3-s.c3-0x0.c3.domain","minInstances":1,"maxInstances":1},"c1-0x0":{"serviceType":"c1","clusterId":"s-g-c1-c2-c3-s.c1-0x0.c1.domain","minInstances":1,"maxInstances":1}},"typeToClusterDataMap":{"c1":{"serviceType":"c1","clusterId":"s-g-c1-c2-c3-s.c1-0x0.c1.domain","minInstances":1,"maxInstances":1},"c2":{"serviceType":"c2","clusterId":"s-g-c1-c2-c3-s.c2-0x0.c2.domain","minInstances":1,"maxInstances":1},"c3":{"serviceType":"c3","clusterId":"s-g-c1-c2-c3-s.c3-0x0.c3.domain","minInstances":1,"maxInstances":1}},"instanceIdToInstanceContextMap":{},"dependencyOrder":{"startupOrders":[{"startupOrderComponentList":["cartridge.c3-0x0","cartridge.c2-0x0"]},{"startupOrderComponentList":["cartridge.c2-0x0","cartridge.c1-0x0"]}],"terminationBehaviour":"terminate-none"},"isGroupScalingEnabled":false,"isGroupInstanceMonitoringEnabled":false,"instanceIdSequence":{"value":0}}},"aliasToClusterDataMap":{},"typeToClusterDataMap":{},"aliasToDeploymentPolicyIdMap":{"c3-0x0":"static-1","c2-0x0":"static-1","c1-0x0":"static-1"},"instanceIdToInstanceContextMap":{"s-g-c1-c2-c3-s-1":{"alias":"s-g-c1-c2-c3-s","instanceId":"s-g-c1-c2-c3-s-1","instanceProperties":{},"lifeCycleStateManager":{"stateStack":["Created","Active","Terminating"],"identifier":"s-g-c1-c2-c3-s_s-g-c1-c2-c3-s-1"},"networkPartitionId":"RegionOne"}},"dependencyOrder":{"startupOrders":[]},"isGroupScalingEnabled":false,"isGroupInstanceMonitoringEnabled":false,"instanceIdSequence":{"value":1}}},"initialized":false}

TID: [0] [STRATOS] [2015-06-08 22:18:41,539]  INFO {org.apache.stratos.autoscaler.applications.topic.ApplicationsEventPublisher} -  Publishing group instance terminated event: [application] s-g-c1-c2-c3-s [group] s-g-c1-c2-c3-s-x0x [instance] s-g-c1-c2-c3-s-1

TID: [0] [STRATOS] [2015-06-08 22:18:41,545]  INFO {org.apache.stratos.autoscaler.monitor.component.GroupMonitor} -  [Group] s-g-c1-c2-c3-s-x0x is notifying the [parent] s-g-c1-c2-c3-s [instance] s-g-c1-c2-c3-s-1

TID: [0] [STRATOS] [2015-06-08 22:18:41,545] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusActiveProcessor} -  GroupProcessor chain calculating the status for the group [ s-g-c1-c2-c3-s ]

TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusActiveProcessor} -  StatusChecker calculating the active status for the group [ s-g-c1-c2-c3-s ]  for the instance  [ s-g-c1-c2-c3-s-1 ]

TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock acquired

TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock released

TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor} -  StatusChecker calculating the terminated status for the group [ s-g-c1-c2-c3-s ]  for the instance  [ s-g-c1-c2-c3-s-1 ]

TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock acquired

TID: [0] [STRATOS] [2015-06-08 22:18:41,546]  INFO {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor} -  Sending application instance terminated for [application] s-g-c1-c2-c3-s [instance] s-g-c1-c2-c3-s-1

 

Unsuccessful:

 

TID: [0] [STRATOS] [2015-06-08 22:33:25,404] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor} -  StatusChecker calculating the terminated status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [ s-g-c1-c2-c3-s-1 ]

TID: [0] [STRATOS] [2015-06-08 22:33:25,404] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock acquired

TID: [0] [STRATOS] [2015-06-08 22:33:25,405] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusProcessor} -  Checking the status of cluster s-g-c1-c2-c3-s.c1-0x0.c1.domain instance status is: Terminating

TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.cloud.controller.messaging.topology.TopologyManager} -  Write lock released

TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusProcessor} -  Checking the status of cluster s-g-c1-c2-c3-s.c1-0x0.c1.domain instance status is: Terminating

TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock released

TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatingProcessor} -  StatusChecker calculating the terminating status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [ s-g-c1-c2-c3-s-1 ]

TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock acquired

TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock released

TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusInactiveProcessor} -  StatusChecker calculating the inactive status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [ s-g-c1-c2-c3-s-1 ]

TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock acquired

TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock released

TID: [0] [STRATOS] [2015-06-08 22:33:25,407]  WARN {org.apache.stratos.autoscaler.status.processor.group.GroupStatusInactiveProcessor} -  No possible state change found for [component] s-g-c1-c2-c3-s-x0x [instance] s-g-c1-c2-c3-s-1

TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -  ClusterMonitor Drools session has been disposed. ClusterMonitor [clusterId=s-g-c1-c2-c3-s.c2-0x0.c2.domain, hasPrimary=false ]

TID: [0] [STRATOS] [2015-06-08 22:33:25,481] ERROR {org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor} -  System error, lock has not released for 30 seconds: [lock-name] application [lock-type] Write [thread-id] 99 [thread-name] pool-26-thread-2 [stack-trace]

java.lang.Thread.getStackTrace(Thread.java:1589)

 

 

 

 

[1.] Application Structure

 

 

 

 

 

 

From: Martin Eppel (meppel)
Sent: Friday, June 05, 2015 4:38 PM


To: dev@stratos.apache.org
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

 

This is another application, see [1.] which fails to get completely removed:

 

Scenario / Observation:

·        After all instances / application go active, one instance is being terminated (to verify termination behavior). Once the terminated instance is restored the application is undeployed.

·        After the Application undeployment process is started, all instances are being terminated

·        Application still shows up in stratos admin, subsequent deployments fail

 

stratos> list-applications

Applications found:

+---------------------+---------------------+----------+

| Application ID      | Alias               | Status   |

+---------------------+---------------------+----------+

| s-n-gr-s-G123-t-a-4 | s-n-gr-s-G123-t-a-4 | Deployed |

+---------------------+---------------------+----------+

 

 

[1.] Application:

 

 

 

 

 

From: Martin Eppel (meppel)
Sent: Friday, June 05, 2015 3:26 PM
To: dev@stratos.apache.org
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

 

After re-running it this my observations:

 

·        After the “Application undeployment process started” is started, there is a likelihood that (a few) VMs are still launched – I suspect this is due to some race condition between “Application undeployment process started” and the “autoscaler”.

·        All Vms which were launched before the “Application undeployment process started” get terminated as part of the undeployment process.

·        Vms which were launched after “Application undeployment process started” eventually get moved to obsolete / pending state and cleaned up, this can take up to 15- 20 minutes.

·        The application never gets completely removed,

·        The following exception is consistently observed:

ID: [0] [STRATOS] [2015-06-05 20:47:07,237]  WARN {org.apache.stratos.common.concurrent.locks.ReadWriteLock} -  System warning! Trying to release a lock which has not been taken by the same thread: [lock-name] application-holder [thread-id] 114 [thread-name] pool-24-thread-2

TID: [0] [STRATOS] [2015-06-05 20:47:07,237] ERROR {org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator} -  Failed to retrieve topology event message

org.apache.stratos.common.exception.InvalidLockRequestedException: System error, cannot acquire a write lock while having a read lock on the same thread: [lock-name] application-holder [thread-id] 114 [thread-name] pool-24-thread-2

                    at org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:114)

                    at org.apache.stratos.autoscaler.applications.ApplicationHolder.acquireWriteLock(ApplicationHolder.java:60)

·        Initiating the “Application undeployment process” again will cause the following INFO statement (without any further actions, see in log)
TID: [0] [STRATOS] [2015-06-05 21:34:34,509]  INFO {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Application monitor is already in terminating, graceful un-deployment is has already been attempted thus not invoking again

·        Other exceptions observed after the “Application undeployment process started”

TID: [0] [STRATOS] [2015-06-05 21:36:29,458] ERROR {org.apache.stratos.autoscaler.rule.RuleTasksDelegator} -  Cannot terminate instance

org.apache.stratos.cloud.controller.stub.CloudControllerServiceInvalidMemberExceptionException: CloudControllerServiceInvalidMemberExceptionException

        at sun.reflect.GeneratedConstructorAccessor219.newInstance(Unknown Source)

        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)

        at java.lang.Class.newInstance(Class.java:374)

        at org.apache.stratos.cloud.controller.stub.CloudControllerServiceStub.terminateInstance(CloudControllerServiceStub.java:8633)

        at org.apache.stratos.common.client.CloudControllerServiceClient.terminateInstance(CloudControllerServiceClient.java:120)

        at org.apache.stratos.autoscaler.rule.RuleTasksDelegator.terminateObsoleteInstance(RuleTasksDelegator.java:298)

        at sun.reflect.GeneratedMethodAccessor413.invoke(Unknown Source)

 

·        Created a jira to track this issue: https://issues.apache.org/jira/browse/STRATOS-1430

 

 

 

Regards

 

Martin

 

Attached the log file of the last test

 

 

 

 

From: Martin Eppel (meppel)
Sent: Friday, June 05, 2015 12:59 PM
To: dev@stratos.apache.org
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

 

For this latest test I got the latest source from stratos repo so I have this commit (see below), but the un-deployment still fails (to some extent).

As mentioned below, it seems that all the members get terminated eventually, including the ones which got started after the “application un-deployment” process started.

What is still left in stratos (even after all members got terminated) is the application (see the stratos> list-applications command result below in email thread). This would still be an issue when re-deploying the application !

I will do a few reruns to verify the removal of the VMs (members) is consistent.

Thanks

 

Martin

 

git show 2fe84b91843b20e91e8cafd06011f42d218f231c

commit 2fe84b91843b20e91e8cafd06011f42d218f231c

Author: anuruddhal <anuruddha789@gmail.com>

Date:   Wed Jun 3 14:41:12 2015 +0530

 

From: Imesh Gunaratne [mailto:imesh@apache.org]
Sent: Friday, June 05, 2015 12:46 PM
To: dev
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

 

Hi Martin,

 

I also encountered a similar issue with the application un-deployment with PCA but I guess you are using JCA. 

 

I can see that Anuruddha has done a fix for the issue I'm referring with the below commit:

 

Regarding the member context not found error, this could occur if the termination request was made for an already terminated member. There is a possibility that Autoscaler make a second terminate request if the first request take some time to execute and at the time the second request hit Cloud Controller the member is already terminated with the first request. 

 

Can you please confirm whether the members were properly terminated and its just this exceptions that you are seeing?

 

Thanks

 

 

On Sat, Jun 6, 2015 at 12:36 AM, Martin Eppel (meppel) <meppel@cisco.com> wrote:

Hi Udara,

 

Picked up your commit and rerun the test case:

 

Attached is the log file (artifacts are the same as before).

 

Didn’t see the issue withMember is in the wrong list” …

 

but see the following exception after the undeploy application message:

TID: [0] [STRATOS] [2015-06-05 18:09:46,836] ERROR {org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator} -  Failed to retrieve topology event message

org.apache.stratos.common.exception.InvalidLockRequestedException: System error, cannot acquire a write lock while having a read lock on the same thread: [lock-name] application-holder [thread-id] 114 [thread-name] pool-24-thread-2

                    at org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:114)

                    at org.apache.stratos.autoscaler.applications.ApplicationHolder.acquireWriteLock(ApplicationHolder.java:60)

 

 

Also, after the “Application undeployment process started” is started, new members are being instantiated:

 

TID: [0] [STRATOS] [2015-06-05 18:07:46,545]  INFO {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher} -  Publishing member created event:

 

 

Eventually, these VMs get terminated :

 

TID: [0] [STRATOS] [2015-06-05 18:42:42,413] ERROR {org.apache.stratos.cloud.controller.services.impl.CloudControllerServiceImpl} -  Could not terminate instance: [member-id] g-sc-G12-1.c1-0x0.c1.domaindd9c1d40-70cc-4950-9757-418afe19ba7f

org.apache.stratos.cloud.controller.exception.InvalidMemberException: Could not terminate instance, member context not found: [member-id] g-sc-G12-1.c1-0x0.c1.domaindd9c1d40-70cc-4950-9757-418afe19ba7f

                    at org.apache.stratos.cloud.controller.services.impl.CloudControllerServiceImpl.terminateInstance(CloudControllerServiceImpl.java:595)

                    at sun.reflect.GeneratedMethodAccessor408.invoke(Unknown Source)

                    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

                    at java.lang.reflect.Method.invoke(Method.java:606)

 

 

but the application remains:

 

stratos> list-applications

Applications found:

+----------------+------------+----------+

| Application ID | Alias      | Status   |

+----------------+------------+----------+

| g-sc-G12-1     | g-sc-G12-1 | Deployed |

+----------------+------------+----------+

 

['g-sc-G12-1: applicationInstances 1, groupInstances 2, clusterInstances 3, members 0 ()\n']

 

 

 

From: Martin Eppel (meppel)
Sent: Friday, June 05, 2015 10:04 AM
To: dev@stratos.apache.org
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

 

Ok:

 

log4j.logger.org.apache.stratos.manager=DEBUG

log4j.logger.org.apache.stratos.autoscaler=DEBUG

log4j.logger.org.apache.stratos.messaging=INFO

log4j.logger.org.apache.stratos.cloud.controller=DEBUG

log4j.logger.org.wso2.andes.client=ERROR

# Autoscaler rule logs

log4j.logger.org.apache.stratos.autoscaler.rule.RuleLog=DEBUG

 

From: Udara Liyanage [mailto:udara@wso2.com]
Sent: Friday, June 05, 2015 10:00 AM
To: dev
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

 

Hi Martin,

 

Better if you can enable debugs logs for all AS, CC and cartridge agent 

 

On Fri, Jun 5, 2015 at 10:23 PM, Udara Liyanage <udara@wso2.com> wrote:

Hi,

 

Please enable AS debug logs.

 

On Fri, Jun 5, 2015 at 9:38 PM, Martin Eppel (meppel) <meppel@cisco.com> wrote:

Hi Udara,

 

Yes, this issue seems to be fairly well reproducible, which debug log do you want me to enable, cartridge agent logs ?

 

Thanks

 

Martin

 

From: Udara Liyanage [mailto:udara@wso2.com]
Sent: Thursday, June 04, 2015 11:11 PM
To: dev
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

 

Hi,

 

This might be possible if AS did not receive member activated event published by CC. Is it possible to enable debug logs if this is reproducible.

Or else I can add an INFO logs and commit.

 

 

On Fri, Jun 5, 2015 at 9:11 AM, Udara Liyanage <udara@wso2.com> wrote:

Hi,

 

 

For the first issue you have mentioned, the particular member is activated, but it is still identified as an obsolete member and is being marked to be terminated since pending time expired. Does that mean member is still in Obsolete list even though it is being activated?

 

//member started

TID: [0] [STRATOS] [2015-06-04 19:53:04,706]  INFO {org.apache.stratos.autoscaler.context.cluster.ClusterContext} -  Member stat context has been added: [application] g-sc-G12-1 [cluster] g-sc-G12-1.c1-0x0.c1.domain [clusterInstanceContext] g-sc-G12-1-1 [partitionContext] whole-region [member-id] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1

 

//member activated

TID: [0] [STRATOS] [2015-06-04 19:56:00,907]  INFO {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher} -  Publishing member activated event: [service-name] c1 [cluster-id] g-sc-G12-1.c1-0x0.c1.domain [cluster-instance-id] g-sc-G12-1-1 [member-id] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1 [network-partition-id] RegionOne [partition-id] whole-region

TID: [0] [STRATOS] [2015-06-04 19:56:00,916]  INFO {org.apache.stratos.messaging.message.processor.topology.MemberActivatedMessageProcessor} -  Member activated: [service] c1 [cluster] g-sc-G12-1.c1-0x0.c1.domain [member] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1

 

//after 15 minutes ---member is still in pending state, pending timeout expired

TID: [0] [STRATOS] [2015-06-04 20:08:04,713]  INFO {org.apache.stratos.autoscaler.context.partition.ClusterLevelPartitionContext$PendingMemberWatcher} -  Pending state of member expired, member will be moved to obsolete list. [pending member] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1 [expiry time] 900000 [cluster] g-sc-G12-1.c1-0x0.c1.domain [cluster instance] null

 

On Fri, Jun 5, 2015 at 5:14 AM, Martin Eppel (meppel) <meppel@cisco.com> wrote:

Hi,

 

I am running into a scenario where application un-deployment fails (using stratos with latest commit  b1b6bca3f99b6127da24c9af0a6b20faff2907be).

 

For application structure see [1.], (debug enabled) wso2carbon.log, application.json, cartridge-group.json, deployment-policy, auto-scaling policies see attached zip file.

 

It is noteworthy, that while the application is running the following log statements /exceptions are observed:

 

Member is in the wrong list and it is removed from active members list: g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1

TID: [0] [STRATOS] [2015-06-04 20:11:03,425] ERROR {org.apache.stratos.autoscaler.rule.RuleTasksDelegator} -  Cannot terminate instance

// after receiving the application undeploy event:

[2015-06-04 20:12:39,465]  INFO {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Application undeployment process started: [application-id] g-sc-G12-1

// a new instance is being started up

[2015-06-04 20:13:13,445]  INFO {org.apache.stratos.cloud.controller.services.impl.InstanceCreator} -  Instance started successfully: [cartridge-type] c2 [cluster-id] g-sc-G12-1.c2-1x0.c2.domain [instance-id] RegionOne/5d4699f7-b00b-42eb-b565-b48fc8f20407

 

// Also noteworthy seems the following warning which is seen repeatedly in the logs:

ReadWriteLock} -  System warning! Trying to release a lock which has not been taken by the same thread: [lock-name]

 

 

[1.] Application structure

 

 

 

 

 

...

[Message clipped]  



 

--

Lasindu Charith

Software Engineer, WSO2 Inc.

Mobile: +94714427192



 

--

Lasindu Charith

Software Engineer, WSO2 Inc.

Mobile: +94714427192