stratos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Martin Eppel (meppel)" <mep...@cisco.com>
Subject RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)
Date Wed, 17 Jun 2015 01:26:53 GMT
Update,

Just noted that the last instance got removed (after a long time) but the application is still
there, attached the (complete) log and application picture [1f.]


[1f.]

[cid:image009.png@01D0A861.FB166F70]


From: Martin Eppel (meppel)
Sent: Tuesday, June 16, 2015 5:55 PM
To: dev@stratos.apache.org; Lasindu Charith (lasindu@wso2.com); Reka Thirunavukkarasu (reka@wso2.com)
Cc: Ryan Du Plessis (rdupless)
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy
(nested grouping, group scaling)


Hi Lasindu, Reka

After incorporating the latest fixes (which solves some part of the “application removal
issues”, Thanks !) howevever I still have an issue with application removal in the case
of a group scaling scenario. Please see the application structure [1a. …] and the attached
zip file with the artifacts (wso2carbon.log, cartridge-group.json, application.json).

Please also note, I updated the application accordingly and removed the cartridge deployment
policy where a group deployment policy was defined, please review the application.json for
correctness.

Observation: After all VMs go active (including the ones spun up by group scaling) and the
“application removal process” is started the application fails to be removed. The majority
of the VMs are terminated but there are 1 or 2 Vms which seem to get started when the application
removed process is imitated. These VMs typically get stuck in the “Started” or “Initialized”
state and are not being terminated.
Please note the group scaling (1.a … 1.c) progression, what I noted is that the groups,
which were created by the autoscaler don’ seem to go into active state, see also attached
log “wso2carbon-all-active.log”

After starting application removal process, see [1d.], logs “wso2carbon-after-remove.log”,
“wso2carbon-after-remove-1.log” and final log “wso2carbon-after-remove-2.log”

Thanks

Martin


[1a.]

[cid:image010.png@01D0A861.FB166F70]

[1b.]

[cid:image011.png@01D0A861.FB166F70]

[1c.] all cartridges active, but it seems “auto scaled “ groups are not in active state

[cid:image012.png@01D0A861.FB166F70]

[1d.]

[cid:image013.png@01D0A861.FB166F70]



[1e.] after application removal process started, instance (stuck) in “Starting” state,
see log wso2carbon-after-remove-1.log

[cid:image014.png@01D0A861.FB166F70]














Thanks

Martin







From: Martin Eppel (meppel)
Sent: Friday, June 12, 2015 10:16 PM
To: dev
Cc: 'Ryan Du Plessis (rdupless)'
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy
(nested grouping, group scaling)

Hi Lasindru,

I have run some tests and the issue, failure to remove an application when an instance is
terminated and restarted seems to be resolved.

However, I do seem to see some issue with group scaling and application removal, but still
have to run some tests next week to get a better understanding (not sure yet if this is an
issue or not), will keep you posted,

Thanks

Martin

From: Lasindu Charith [mailto:lasindu@wso2.com]
Sent: Friday, June 12, 2015 9:41 AM
To: dev
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy
(nested grouping, group scaling)

Hi Martin,

I have fixed above issue in commit 03de83172309c2932075fb5284c120ca610bbf0a. Please take a
pull from the master and try-out your scenario again to see if undeployment/redeployment works
as expected.

Thanks,


On Thu, Jun 11, 2015 at 11:52 PM, Lasindu Charith <lasindu@wso2.com<mailto:lasindu@wso2.com>>
wrote:
Hi Martin,

I guess my previous observation is incorrect. The root cause for the above issue is because
ClusterStatusTerminatedProcessor does not send ClusterTerminatedEvent for all 3 clusters.
It only sends 1 and fails to send other 2 clusterTerminated events. This is because, when
application is activated again ClusterLevelPartitionContext is added twice to the clusterInstanceContext.
This makes the if condition failed at [1] when trying to find out whether cluster monitor
has any non terminated members at ClusterStatusTerminatedProcessor before sending clusterTerminated
event. Will try to find a proper solution and update the thread.


[1] https://github.com/apache/stratos/blob/master/components/org.apache.stratos.autoscaler/src/main/java/org/apache/stratos/autoscaler/status/processor/cluster/ClusterStatusTerminatedProcessor.java#L90

Thanks,


On Thu, Jun 11, 2015 at 10:29 PM, Martin Eppel (meppel) <meppel@cisco.com<mailto:meppel@cisco.com>>
wrote:
Is there any conclusion how to this fix this ?

Thanks


Martin

From: Lahiru Sandaruwan [mailto:lahirus@wso2.com<mailto:lahirus@wso2.com>]
Sent: Wednesday, June 10, 2015 6:55 PM
To: dev
Cc: Reka Thirunavukkarasu

Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy
(nested grouping, group scaling)

Hi Imesh,

Following could be the possible reason for not un-deploying when member was auto healed,


  *   The particular cluster, that the member is auto healed, is terminated before others(when
others are terminating state)
or

  *   The particular cluster, that the member is auto healed, is still terminating when others
are terminated state
One of those two cases could happen, even if the member was not auto healed(In case of groups,
where one group is very complex, and others are simple). Because, currently we check whether
all the cluster and groups in terminating status in the case of the parent group is terminating,
which is wrong.

Thanks.

On Thu, Jun 11, 2015 at 5:49 AM, Imesh Gunaratne <imesh@apache.org<mailto:imesh@apache.org>>
wrote:
Do we know why this only happens if a member was forcefully terminated and auto-healed?

On Wed, Jun 10, 2015 at 10:01 PM, Lasindu Charith <lasindu@wso2.com<mailto:lasindu@wso2.com>>
wrote:
Hi  all,

Cause for above issue seems to be as follows.
GroupStatusTerminatedProcessor and GroupStatusTerminatedProcessor processes the event only
if all the group instances and cluster instances are in terminated state or in terminating
state consequently[1][2]. But there can be situations(such as above), where some group instances
are at terminated state and some at terminating state by the time GroupStatusProcessorChain
is executed. For similar scenarios, both GroupStatusTerminatedProcessor and GroupStatusTerminatedProcessor
executions are skipped and at GroupStatusInactiveProcessor it prints" No possible state change
found" warning.

I think we need to find a way to properly fix this.

[1] https://github.com/apache/stratos/blob/master/components/org.apache.stratos.autoscaler/src/main/java/org/apache/stratos/autoscaler/status/processor/group/GroupStatusTerminatedProcessor.java#L91
[2] https://github.com/apache/stratos/blob/master/components/org.apache.stratos.autoscaler/src/main/java/org/apache/stratos/autoscaler/status/processor/group/GroupStatusTerminatingProcessor.java#L89

On Tue, Jun 9, 2015 at 8:09 PM, Lasindu Charith <lasindu@wso2.com<mailto:lasindu@wso2.com>>
wrote:
Hi Martin,

I was able to reproduce this issue in the latest build with PCA in Openstack. Even after stratos
is restarted, the Application is not undeployed, which makes it impossible to undeploy the
application (even the forceful undeployment failed for the above obsolete application).

Currently I'm looking at possible causes for this and will update with the progress.

Thanks,

On Tue, Jun 9, 2015 at 5:59 AM, Martin Eppel (meppel) <meppel@cisco.com<mailto:meppel@cisco.com>>
wrote:
Here is another example where the removal fails:

For application see [1.], log file (with debug enabled) and jsons are attached.

Scenario:


•        Deploy application and wait for all cartridges to become active

•        Kill a VM (2nd in startup sequence)

•        Wait for it to restart and become active

•        Un-deploy application

a.      Un-deploy forcefully will succeed
([2015-06-08 20:38:21,487]  INFO {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl}
-  Forcefully un-deploying the application s-g-c1-c2-c3-s)
und

b.      Un-deploy gracefully will fail to remove app completely (although VMs are terminated
successfully)
([2015-06-08 20:54:16,372]  INFO {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl}
-  Starting to undeploy application: [application-id])

•        Both scenarios are recorded in the same log file wso2carbon-s-g-c1-c2-c3-s.log

•        Btw, I retested the scenario and the issue is easily  reproducible following the
steps listed above:
graceful application un-deploy succeeds if no VM had been restarted (terminated and restarted
by autoscaler).
Once a VM is terminated , graceful application un-deploy will fail
I attached a log file which demonstrates this case (wso2carbon-s-g-c1-c2-c3-s-scen-2.log).
In this scenario, the same application is deployed, becomes active and is then removed (repetead
2 times), then, a VM is terminated and restarted by autoscaler. Afterwards, graceful application
un-deploy fails.


Other Observations:

When the application successfully some events e.g. “cluster removed event”, “Application
deleted event received:” are being published (see [2.] while when the application fails
to be removed no such event is being observed.

[2.] cluster removed event when application is un-deployed forcefully
TID: [0] [STRATOS] [2015-06-08 20:38:34,187]  INFO {org.apache.stratos.cloud.controller.messaging.receiver.application.ApplicationEventReceiver}
-  Application deleted event received: [application-id] s-g-c1-c2-c3-s
TID: [0] [STRATOS] [2015-06-08 20:38:34,220]  INFO {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher}
-  Publishing application clusters removed event: [application-id] s-g-c1-c2-c3-s


I analyzed the differences in the successful application removal and unsuccessful log sequence
and noticed a difference (see also highlighted areas):

Successful (see logs in wso2carbon-s-g-c1-c2-c3-s-scen-2.log)

TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor}
-  StatusChecker calculating the terminated status for the group [ s-g-c1-c2-c3-s-x0x ]  for
the instance  [ s-g-c1-c2-c3-s-1 ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder}
-  Write lock acquired
TID: [0] [STRATOS] [2015-06-08 22:18:41,527]  INFO {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor}
-  Sending group instance terminated for [group] s-g-c1-c2-c3-s-x0x [instance] s-g-c1-c2-c3-s-1
TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG {org.apache.stratos.autoscaler.applications.topic.ApplicationBuilder}
-  Handling group terminated event: [group-id] s-g-c1-c2-c3-s-x0x [application-id] s-g-c1-c2-c3-s
[instance] s-g-c1-c2-c3-s-1
TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor}
-  ClusterMonitor Drools session has been disposed. ClusterMonitor [clusterId=s-g-c1-c2-c3-s.c3-0x0.c3.domain,
hasPrimary=false ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor}
-  ClusterMonitor Drools session has been disposed. ClusterMonitor [clusterId=s-g-c1-c2-c3-s.c1-0x0.c1.domain,
hasPrimary=false ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor}
-  ClusterMonitor Drools session has been disposed. ClusterMonitor [clusterId=s-g-c1-c2-c3-s.c2-0x0.c2.domain,
hasPrimary=false ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,529] DEBUG {org.apache.stratos.cloud.controller.messaging.topology.TopologyManager}
-  Write lock released
TID: [0] [STRATOS] [2015-06-08 22:18:41,536] DEBUG {org.apache.stratos.autoscaler.registry.RegistryManager}
-  Application [ s-g-c1-c2-c3-s ] persisted successfully in the Autoscaler Registry
TID: [0] [STRATOS] [2015-06-08 22:18:41,538] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder}
-  Applications updated: {"applicationMap":{"s-g-c1-c2-c3-s":{"id":"s-g-c1-c2-c3-s","key":"l8V7OpRqOfBcWbBw","tenantId":-1234,"tenantDomain":"carbon.super","tenantAdminUserName":"admin","applicationPolicyId":"default-iaas","aliasToGroupMap":{"s-g-c1-c2-c3-s-x0x":{"name":"s-g-c1-c2-c3-s","alias":"s-g-c1-c2-c3-s-x0x","groupMinInstances":1,"groupMaxInstances":1,"applicationId":"s-g-c1-c2-c3-s","aliasToGroupMap":{},"aliasToClusterDataMap":{"c2-0x0":{"serviceType":"c2","clusterId":"s-g-c1-c2-c3-s.c2-0x0.c2.domain","minInstances":1,"maxInstances":1},"c3-0x0":{"serviceType":"c3","clusterId":"s-g-c1-c2-c3-s.c3-0x0.c3.domain","minInstances":1,"maxInstances":1},"c1-0x0":{"serviceType":"c1","clusterId":"s-g-c1-c2-c3-s.c1-0x0.c1.domain","minInstances":1,"maxInstances":1}},"typeToClusterDataMap":{"c1":{"serviceType":"c1","clusterId":"s-g-c1-c2-c3-s.c1-0x0.c1.domain","minInstances":1,"maxInstances":1},"c2":{"serviceType":"c2","clusterId":"s-g-c1-c2-c3-s.c2-0x0.c2.domain","minInstances":1,"maxInstances":1},"c3":{"serviceType":"c3","clusterId":"s-g-c1-c2-c3-s.c3-0x0.c3.domain","minInstances":1,"maxInstances":1}},"instanceIdToInstanceContextMap":{},"dependencyOrder":{"startupOrders":[{"startupOrderComponentList":["cartridge.c3-0x0","cartridge.c2-0x0"]},{"startupOrderComponentList":["cartridge.c2-0x0","cartridge.c1-0x0"]}],"terminationBehaviour":"terminate-none"},"isGroupScalingEnabled":false,"isGroupInstanceMonitoringEnabled":false,"instanceIdSequence":{"value":0}}},"aliasToClusterDataMap":{},"typeToClusterDataMap":{},"aliasToDeploymentPolicyIdMap":{"c3-0x0":"static-1","c2-0x0":"static-1","c1-0x0":"static-1"},"instanceIdToInstanceContextMap":{"s-g-c1-c2-c3-s-1":{"alias":"s-g-c1-c2-c3-s","instanceId":"s-g-c1-c2-c3-s-1","instanceProperties":{},"lifeCycleStateManager":{"stateStack":["Created","Active","Terminating"],"identifier":"s-g-c1-c2-c3-s_s-g-c1-c2-c3-s-1"},"networkPartitionId":"RegionOne"}},"dependencyOrder":{"startupOrders":[]},"isGroupScalingEnabled":false,"isGroupInstanceMonitoringEnabled":false,"instanceIdSequence":{"value":1}}},"initialized":false}
TID: [0] [STRATOS] [2015-06-08 22:18:41,539]  INFO {org.apache.stratos.autoscaler.applications.topic.ApplicationsEventPublisher}
-  Publishing group instance terminated event: [application] s-g-c1-c2-c3-s [group] s-g-c1-c2-c3-s-x0x
[instance] s-g-c1-c2-c3-s-1
TID: [0] [STRATOS] [2015-06-08 22:18:41,545]  INFO {org.apache.stratos.autoscaler.monitor.component.GroupMonitor}
-  [Group] s-g-c1-c2-c3-s-x0x is notifying the [parent] s-g-c1-c2-c3-s [instance] s-g-c1-c2-c3-s-1
TID: [0] [STRATOS] [2015-06-08 22:18:41,545] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusActiveProcessor}
-  GroupProcessor chain calculating the status for the group [ s-g-c1-c2-c3-s ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusActiveProcessor}
-  StatusChecker calculating the active status for the group [ s-g-c1-c2-c3-s ]  for the instance
 [ s-g-c1-c2-c3-s-1 ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder}
-  Write lock acquired
TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder}
-  Write lock released
TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor}
-  StatusChecker calculating the terminated status for the group [ s-g-c1-c2-c3-s ]  for the
instance  [ s-g-c1-c2-c3-s-1 ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder}
-  Write lock acquired
TID: [0] [STRATOS] [2015-06-08 22:18:41,546]  INFO {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor}
-  Sending application instance terminated for [application] s-g-c1-c2-c3-s [instance] s-g-c1-c2-c3-s-1

Unsuccessful:

TID: [0] [STRATOS] [2015-06-08 22:33:25,404] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor}
-  StatusChecker calculating the terminated status for the group [ s-g-c1-c2-c3-s-x0x ]  for
the instance  [ s-g-c1-c2-c3-s-1 ]
TID: [0] [STRATOS] [2015-06-08 22:33:25,404] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder}
-  Write lock acquired
TID: [0] [STRATOS] [2015-06-08 22:33:25,405] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusProcessor}
-  Checking the status of cluster s-g-c1-c2-c3-s.c1-0x0.c1.domain instance status is: Terminating
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.cloud.controller.messaging.topology.TopologyManager}
-  Write lock released
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusProcessor}
-  Checking the status of cluster s-g-c1-c2-c3-s.c1-0x0.c1.domain instance status is: Terminating
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder}
-  Write lock released
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatingProcessor}
-  StatusChecker calculating the terminating status for the group [ s-g-c1-c2-c3-s-x0x ] 
for the instance  [ s-g-c1-c2-c3-s-1 ]
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder}
-  Write lock acquired
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder}
-  Write lock released
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusInactiveProcessor}
-  StatusChecker calculating the inactive status for the group [ s-g-c1-c2-c3-s-x0x ]  for
the instance  [ s-g-c1-c2-c3-s-1 ]
TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder}
-  Write lock acquired
TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder}
-  Write lock released
TID: [0] [STRATOS] [2015-06-08 22:33:25,407]  WARN {org.apache.stratos.autoscaler.status.processor.group.GroupStatusInactiveProcessor}
-  No possible state change found for [component] s-g-c1-c2-c3-s-x0x [instance] s-g-c1-c2-c3-s-1
TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor}
-  ClusterMonitor Drools session has been disposed. ClusterMonitor [clusterId=s-g-c1-c2-c3-s.c2-0x0.c2.domain,
hasPrimary=false ]
TID: [0] [STRATOS] [2015-06-08 22:33:25,481] ERROR {org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor}
-  System error, lock has not released for 30 seconds: [lock-name] application [lock-type]
Write [thread-id] 99 [thread-name] pool-26-thread-2 [stack-trace]
java.lang.Thread.getStackTrace(Thread.java:1589)




[1.] Application Structure
[cid:image015.png@01D0A861.FB166F70]






From: Martin Eppel (meppel)
Sent: Friday, June 05, 2015 4:38 PM

To: dev@stratos.apache.org<mailto:dev@stratos.apache.org>
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy
(nested grouping, group scaling)

This is another application, see [1.] which fails to get completely removed:

Scenario / Observation:

•        After all instances / application go active, one instance is being terminated (to
verify termination behavior). Once the terminated instance is restored the application is
undeployed.

•        After the Application undeployment process is started, all instances are being
terminated

•        Application still shows up in stratos admin, subsequent deployments fail

stratos> list-applications
Applications found:
+---------------------+---------------------+----------+
| Application ID      | Alias               | Status   |
+---------------------+---------------------+----------+
| s-n-gr-s-G123-t-a-4 | s-n-gr-s-G123-t-a-4 | Deployed |
+---------------------+---------------------+----------+


[1.] Application:

[cid:image016.png@01D0A861.FB166F70]




From: Martin Eppel (meppel)
Sent: Friday, June 05, 2015 3:26 PM
To: dev@stratos.apache.org<mailto:dev@stratos.apache.org>
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy
(nested grouping, group scaling)

After re-running it this my observations:


•        After the “Application undeployment process started” is started, there is a
likelihood that (a few) VMs are still launched – I suspect this is due to some race condition
between “Application undeployment process started” and the “autoscaler”.

•        All Vms which were launched before the “Application undeployment process started”
get terminated as part of the undeployment process.

•        Vms which were launched after “Application undeployment process started” eventually
get moved to obsolete / pending state and cleaned up, this can take up to 15- 20 minutes.

•        The application never gets completely removed,

•        The following exception is consistently observed:

ID: [0] [STRATOS] [2015-06-05 20:47:07,237]  WARN {org.apache.stratos.common.concurrent.locks.ReadWriteLock}
-  System warning! Trying to release a lock which has not been taken by the same thread: [lock-name]
application-holder [thread-id] 114 [thread-name] pool-24-thread-2

TID: [0] [STRATOS] [2015-06-05 20:47:07,237] ERROR {org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator}
-  Failed to retrieve topology event message

org.apache.stratos.common.exception.InvalidLockRequestedException: System error, cannot acquire
a write lock while having a read lock on the same thread: [lock-name] application-holder [thread-id]
114 [thread-name] pool-24-thread-2

                    at org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:114)

                    at org.apache.stratos.autoscaler.applications.ApplicationHolder.acquireWriteLock(ApplicationHolder.java:60)

•        Initiating the “Application undeployment process” again will cause the following
INFO statement (without any further actions, see in log)
TID: [0] [STRATOS] [2015-06-05 21:34:34,509]  INFO {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl}
-  Application monitor is already in terminating, graceful un-deployment is has already been
attempted thus not invoking again

•        Other exceptions observed after the “Application undeployment process started”
TID: [0] [STRATOS] [2015-06-05 21:36:29,458] ERROR {org.apache.stratos.autoscaler.rule.RuleTasksDelegator}
-  Cannot terminate instance
org.apache.stratos.cloud.controller.stub.CloudControllerServiceInvalidMemberExceptionException:
CloudControllerServiceInvalidMemberExceptionException
        at sun.reflect.GeneratedConstructorAccessor219.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at java.lang.Class.newInstance(Class.java:374)
        at org.apache.stratos.cloud.controller.stub.CloudControllerServiceStub.terminateInstance(CloudControllerServiceStub.java:8633)
        at org.apache.stratos.common.client.CloudControllerServiceClient.terminateInstance(CloudControllerServiceClient.java:120)
        at org.apache.stratos.autoscaler.rule.RuleTasksDelegator.terminateObsoleteInstance(RuleTasksDelegator.java:298)
        at sun.reflect.GeneratedMethodAccessor413.invoke(Unknown Source)


•        Created a jira to track this issue: https://issues.apache.org/jira/browse/STRATOS-1430







Regards



Martin



Attached the log file of the last test







From: Martin Eppel (meppel)
Sent: Friday, June 05, 2015 12:59 PM
To: dev@stratos.apache.org<mailto:dev@stratos.apache.org>
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy
(nested grouping, group scaling)

For this latest test I got the latest source from stratos repo so I have this commit (see
below), but the un-deployment still fails (to some extent).
As mentioned below, it seems that all the members get terminated eventually, including the
ones which got started after the “application un-deployment” process started.
What is still left in stratos (even after all members got terminated) is the application (see
the stratos> list-applications command result below in email thread). This would still
be an issue when re-deploying the application !
I will do a few reruns to verify the removal of the VMs (members) is consistent.
Thanks

Martin

git show 2fe84b91843b20e91e8cafd06011f42d218f231c
commit 2fe84b91843b20e91e8cafd06011f42d218f231c
Author: anuruddhal <anuruddha789@gmail.com<mailto:anuruddha789@gmail.com>>
Date:   Wed Jun 3 14:41:12 2015 +0530

From: Imesh Gunaratne [mailto:imesh@apache.org]
Sent: Friday, June 05, 2015 12:46 PM
To: dev
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy
(nested grouping, group scaling)

Hi Martin,

I also encountered a similar issue with the application un-deployment with PCA but I guess
you are using JCA.

I can see that Anuruddha has done a fix for the issue I'm referring with the below commit:
https://github.com/apache/stratos/commit/2fe84b91843b20e91e8cafd06011f42d218f231c

Regarding the member context not found error, this could occur if the termination request
was made for an already terminated member. There is a possibility that Autoscaler make a second
terminate request if the first request take some time to execute and at the time the second
request hit Cloud Controller the member is already terminated with the first request.

Can you please confirm whether the members were properly terminated and its just this exceptions
that you are seeing?

Thanks


On Sat, Jun 6, 2015 at 12:36 AM, Martin Eppel (meppel) <meppel@cisco.com<mailto:meppel@cisco.com>>
wrote:
Hi Udara,

Picked up your commit and rerun the test case:

Attached is the log file (artifacts are the same as before).

Didn’t see the issue with “Member is in the wrong list” …

but see the following exception after the undeploy application message:
TID: [0] [STRATOS] [2015-06-05 18:09:46,836] ERROR {org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator}
-  Failed to retrieve topology event message
org.apache.stratos.common.exception.InvalidLockRequestedException: System error, cannot acquire
a write lock while having a read lock on the same thread: [lock-name] application-holder [thread-id]
114 [thread-name] pool-24-thread-2
                    at org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:114)
                    at org.apache.stratos.autoscaler.applications.ApplicationHolder.acquireWriteLock(ApplicationHolder.java:60)


Also, after the “Application undeployment process started” is started, new members are
being instantiated:

TID: [0] [STRATOS] [2015-06-05 18:07:46,545]  INFO {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher}
-  Publishing member created event:


Eventually, these VMs get terminated :

TID: [0] [STRATOS] [2015-06-05 18:42:42,413] ERROR {org.apache.stratos.cloud.controller.services.impl.CloudControllerServiceImpl}
-  Could not terminate instance: [member-id] g-sc-G12-1.c1-0x0.c1.domaindd9c1d40-70cc-4950-9757-418afe19ba7f
org.apache.stratos.cloud.controller.exception.InvalidMemberException: Could not terminate
instance, member context not found: [member-id] g-sc-G12-1.c1-0x0.c1.domaindd9c1d40-70cc-4950-9757-418afe19ba7f
                    at org.apache.stratos.cloud.controller.services.impl.CloudControllerServiceImpl.terminateInstance(CloudControllerServiceImpl.java:595)
                    at sun.reflect.GeneratedMethodAccessor408.invoke(Unknown Source)
                    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                    at java.lang.reflect.Method.invoke(Method.java:606)


but the application remains:

stratos> list-applications
Applications found:
+----------------+------------+----------+
| Application ID | Alias      | Status   |
+----------------+------------+----------+
| g-sc-G12-1     | g-sc-G12-1 | Deployed |
+----------------+------------+----------+

['g-sc-G12-1: applicationInstances 1, groupInstances 2, clusterInstances 3, members 0 ()\n']



From: Martin Eppel (meppel)
Sent: Friday, June 05, 2015 10:04 AM
To: dev@stratos.apache.org<mailto:dev@stratos.apache.org>
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy
(nested grouping, group scaling)

Ok:

log4j.logger.org.apache.stratos.manager=DEBUG
log4j.logger.org.apache.stratos.autoscaler=DEBUG
log4j.logger.org.apache.stratos.messaging=INFO
log4j.logger.org.apache.stratos.cloud.controller=DEBUG
log4j.logger.org.wso2.andes.client=ERROR
# Autoscaler rule logs
log4j.logger.org.apache.stratos.autoscaler.rule.RuleLog=DEBUG

From: Udara Liyanage [mailto:udara@wso2.com]
Sent: Friday, June 05, 2015 10:00 AM
To: dev
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy
(nested grouping, group scaling)

Hi Martin,

Better if you can enable debugs logs for all AS, CC and cartridge agent

On Fri, Jun 5, 2015 at 10:23 PM, Udara Liyanage <udara@wso2.com<mailto:udara@wso2.com>>
wrote:
Hi,

Please enable AS debug logs.

On Fri, Jun 5, 2015 at 9:38 PM, Martin Eppel (meppel) <meppel@cisco.com<mailto:meppel@cisco.com>>
wrote:
Hi Udara,

Yes, this issue seems to be fairly well reproducible, which debug log do you want me to enable,
cartridge agent logs ?

Thanks

Martin

From: Udara Liyanage [mailto:udara@wso2.com<mailto:udara@wso2.com>]
Sent: Thursday, June 04, 2015 11:11 PM
To: dev
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy
(nested grouping, group scaling)

Hi,

This might be possible if AS did not receive member activated event published by CC. Is it
possible to enable debug logs if this is reproducible.
Or else I can add an INFO logs and commit.


On Fri, Jun 5, 2015 at 9:11 AM, Udara Liyanage <udara@wso2.com<mailto:udara@wso2.com>>
wrote:
Hi,


For the first issue you have mentioned, the particular member is activated, but it is still
identified as an obsolete member and is being marked to be terminated since pending time expired.
Does that mean member is still in Obsolete list even though it is being activated?

//member started
TID: [0] [STRATOS] [2015-06-04 19:53:04,706]  INFO {org.apache.stratos.autoscaler.context.cluster.ClusterContext}
-  Member stat context has been added: [application] g-sc-G12-1 [cluster] g-sc-G12-1.c1-0x0.c1.domain
[clusterInstanceContext] g-sc-G12-1-1 [partitionContext] whole-region [member-id] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1

//member activated
TID: [0] [STRATOS] [2015-06-04 19:56:00,907]  INFO {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher}
-  Publishing member activated event: [service-name] c1 [cluster-id] g-sc-G12-1.c1-0x0.c1.domain
[cluster-instance-id] g-sc-G12-1-1 [member-id] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
[network-partition-id] RegionOne [partition-id] whole-region
TID: [0] [STRATOS] [2015-06-04 19:56:00,916]  INFO {org.apache.stratos.messaging.message.processor.topology.MemberActivatedMessageProcessor}
-  Member activated: [service] c1 [cluster] g-sc-G12-1.c1-0x0.c1.domain [member] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1

//after 15 minutes ---member is still in pending state, pending timeout expired
TID: [0] [STRATOS] [2015-06-04 20:08:04,713]  INFO {org.apache.stratos.autoscaler.context.partition.ClusterLevelPartitionContext$PendingMemberWatcher}
-  Pending state of member expired, member will be moved to obsolete list. [pending member]
g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1 [expiry time] 900000 [cluster]
g-sc-G12-1.c1-0x0.c1.domain [cluster instance] null

On Fri, Jun 5, 2015 at 5:14 AM, Martin Eppel (meppel) <meppel@cisco.com<mailto:meppel@cisco.com>>
wrote:
Hi,

I am running into a scenario where application un-deployment fails (using stratos with latest
commit  b1b6bca3f99b6127da24c9af0a6b20faff2907be).

For application structure see [1.], (debug enabled) wso2carbon.log, application.json, cartridge-group.json,
deployment-policy, auto-scaling policies see attached zip file.

It is noteworthy, that while the application is running the following log statements /exceptions
are observed:

…
Member is in the wrong list and it is removed from active members list: g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
…
TID: [0] [STRATOS] [2015-06-04 20:11:03,425] ERROR {org.apache.stratos.autoscaler.rule.RuleTasksDelegator}
-  Cannot terminate instance
…
// after receiving the application undeploy event:
[2015-06-04 20:12:39,465]  INFO {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl}
-  Application undeployment process started: [application-id] g-sc-G12-1
// a new instance is being started up
…
[2015-06-04 20:13:13,445]  INFO {org.apache.stratos.cloud.controller.services.impl.InstanceCreator}
-  Instance started successfully: [cartridge-type] c2 [cluster-id] g-sc-G12-1.c2-1x0.c2.domain
[instance-id] RegionOne/5d4699f7-b00b-42eb-b565-b48fc8f20407

// Also noteworthy seems the following warning which is seen repeatedly in the logs:
ReadWriteLock} -  System warning! Trying to release a lock which has not been taken by the
same thread: [lock-name]


[1.] Application structure

[cid:image017.png@01D0A861.FB166F70]




...

[Message clipped]



--
Lasindu Charith
Software Engineer, WSO2 Inc.
Mobile: +94714427192<tel:%2B94714427192>
Web: blog.lasindu.com<http://blog.lasindu.com>



--
Lasindu Charith
Software Engineer, WSO2 Inc.
Mobile: +94714427192<tel:%2B94714427192>
Web: blog.lasindu.com<http://blog.lasindu.com>
Mime
View raw message