stratos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lasindu Charith <lasi...@wso2.com>
Subject Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)
Date Fri, 12 Jun 2015 16:40:50 GMT
Hi Martin,

I have fixed above issue in
commit 03de83172309c2932075fb5284c120ca610bbf0a. Please take a pull from
the master and try-out your scenario again to see if
undeployment/redeployment works as expected.

Thanks,


On Thu, Jun 11, 2015 at 11:52 PM, Lasindu Charith <lasindu@wso2.com> wrote:

> Hi Martin,
>
> I guess my previous observation is incorrect. The root cause for the above
> issue is because *ClusterStatusTerminatedProcessor* does not send *ClusterTerminatedEvent
> *for all 3 clusters. It only sends 1 and fails to send other 2
> clusterTerminated events. This is because, when application is activated
> again *ClusterLevelPartitionContext *is added twice to the *clusterInstanceContext.
> *This makes the if condition failed at [1] when trying to find out
> whether cluster monitor has any non terminated members at
> *ClusterStatusTerminatedProcessor *before sending clusterTerminated
> event. Will try to find a proper solution and update the thread.
>
>
> [1]
> https://github.com/apache/stratos/blob/master/components/org.apache.stratos.autoscaler/src/main/java/org/apache/stratos/autoscaler/status/processor/cluster/ClusterStatusTerminatedProcessor.java#L90
>
> Thanks,
>
>
> On Thu, Jun 11, 2015 at 10:29 PM, Martin Eppel (meppel) <meppel@cisco.com>
> wrote:
>
>>  Is there any conclusion how to this fix this ?
>>
>>
>>
>> Thanks
>>
>>
>>
>>
>>
>> Martin
>>
>>
>>
>> *From:* Lahiru Sandaruwan [mailto:lahirus@wso2.com]
>> *Sent:* Wednesday, June 10, 2015 6:55 PM
>> *To:* dev
>> *Cc:* Reka Thirunavukkarasu
>>
>> *Subject:* Re: Testing Stratos 4.1: Application undeployment:
>> application fails to undeploy (nested grouping, group scaling)
>>
>>
>>
>> Hi Imesh,
>>
>>
>>
>> Following could be the possible reason for not un-deploying when member
>> was auto healed,
>>
>>
>>
>>    - The particular cluster, that the member is auto healed, is
>>    terminated before others(when others are terminating state)
>>
>>  or
>>
>>    - The particular cluster, that the member is auto healed, is still
>>    terminating when others are terminated state
>>
>>  One of those two cases could happen, even if the member was not auto
>> healed(In case of groups, where one group is very complex, and others are
>> simple). Because, currently we check whether all the cluster and groups in
>> *terminating* status in the case of the parent group is *terminating,* which
>> is wrong.
>>
>>
>>
>> Thanks.
>>
>>
>>
>> On Thu, Jun 11, 2015 at 5:49 AM, Imesh Gunaratne <imesh@apache.org>
>> wrote:
>>
>> Do we know why this only happens if a member was forcefully terminated
>> and auto-healed?
>>
>>
>>
>> On Wed, Jun 10, 2015 at 10:01 PM, Lasindu Charith <lasindu@wso2.com>
>> wrote:
>>
>> Hi  all,
>>
>>
>>
>> Cause for above issue seems to be as follows.
>>
>> GroupStatusTerminatedProcessor and GroupStatusTerminatedProcessor
>> processes the event only if all the group instances and cluster instances
>> are in terminated state or in terminating state consequently[1][2]. But
>> there can be situations(such as above), where some group instances are at
>> terminated state and some at terminating state by the
>> time GroupStatusProcessorChain is executed. For similar scenarios, both
>> GroupStatusTerminatedProcessor and GroupStatusTerminatedProcessor
>> executions are skipped and at GroupStatusInactiveProcessor it prints" No
>> possible state change found" warning.
>>
>>
>>
>> I think we need to find a way to properly fix this.
>>
>>
>>
>> [1]
>> https://github.com/apache/stratos/blob/master/components/org.apache.stratos.autoscaler/src/main/java/org/apache/stratos/autoscaler/status/processor/group/GroupStatusTerminatedProcessor.java#L91
>>
>> [2]
>> https://github.com/apache/stratos/blob/master/components/org.apache.stratos.autoscaler/src/main/java/org/apache/stratos/autoscaler/status/processor/group/GroupStatusTerminatingProcessor.java#L89
>>
>>
>>
>> On Tue, Jun 9, 2015 at 8:09 PM, Lasindu Charith <lasindu@wso2.com> wrote:
>>
>> Hi Martin,
>>
>>
>>
>> I was able to reproduce this issue in the latest build with PCA in
>> Openstack. Even after stratos is restarted, the Application is not
>> undeployed, which makes it impossible to undeploy the application (even the
>> forceful undeployment failed for the above obsolete application).
>>
>>
>>
>> Currently I'm looking at possible causes for this and will update with
>> the progress.
>>
>>
>>
>> Thanks,
>>
>>
>>
>> On Tue, Jun 9, 2015 at 5:59 AM, Martin Eppel (meppel) <meppel@cisco.com>
>> wrote:
>>
>> Here is another example where the removal fails:
>>
>>
>>
>> For application see [1.], log file (with debug enabled) and jsons are
>> attached.
>>
>>
>>
>> Scenario:
>>
>>
>>
>> ·        Deploy application and wait for all cartridges to become active
>>
>> ·        Kill a VM (2nd in startup sequence)
>>
>> ·        Wait for it to restart and become active
>>
>> ·        Un-deploy application
>>
>> a.      Un-deploy forcefully will succeed
>> ([2015-06-08 20:38:21,487]  INFO
>> {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -
>> Forcefully un-deploying the application s-g-c1-c2-c3-s)
>> und
>>
>> b.      Un-deploy gracefully will fail to remove app completely
>> (although VMs are terminated successfully)
>> ([2015-06-08 20:54:16,372]  INFO
>> {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -
>> Starting to undeploy application: [application-id])
>>
>> ·        Both scenarios are recorded in the same log file
>> wso2carbon-s-g-c1-c2-c3-s.log
>>
>> ·        Btw, I retested the scenario and the issue is easily
>>  reproducible following the steps listed above:
>> graceful application un-deploy succeeds if no VM had been restarted
>> (terminated and restarted by autoscaler).
>> Once a VM is terminated , graceful application un-deploy will fail
>> I attached a log file which demonstrates this case
>> (wso2carbon-s-g-c1-c2-c3-s-scen-2.log). In this scenario, the same
>> application is deployed, becomes active and is then removed (repetead 2
>> times), then, a VM is terminated and restarted by autoscaler. Afterwards,
>> graceful application un-deploy fails.
>>
>>
>>
>>
>>
>> Other Observations:
>>
>>
>>
>> When the application successfully some events e.g. “cluster removed
>> event”, “Application deleted event received:” are being published (see [2.]
>> while when the application fails to be removed no such event is being
>> observed.
>>
>>
>>
>> [2.] cluster removed event when application is un-deployed forcefully
>>
>> TID: [0] [STRATOS] [2015-06-08 20:38:34,187]  INFO
>> {org.apache.stratos.cloud.controller.messaging.receiver.application.ApplicationEventReceiver}
>> -  Application deleted event received: [application-id] s-g-c1-c2-c3-s
>>
>> TID: [0] [STRATOS] [2015-06-08 20:38:34,220]  INFO
>> {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher}
>> -  Publishing application clusters removed event: [application-id]
>> s-g-c1-c2-c3-s
>>
>>
>>
>>
>>
>> I analyzed the differences in the successful application removal and
>> unsuccessful log sequence and noticed a difference (see also highlighted
>> areas):
>>
>>
>>
>> Successful (see logs in wso2carbon-s-g-c1-c2-c3-s-scen-2.log)
>>
>>
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG
>> {org.apache.stratos.autoscaler.status.processor.group.*
>> *GroupStatusTerminatedProcessor**} -  StatusChecker calculating the
>> terminated status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [
>> s-g-c1-c2-c3-s-1 ]*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG
>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
>> lock acquired*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,527]  INFO
>> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor}
>> -  Sending group instance terminated for [group] s-g-c1-c2-c3-s-x0x
>> [instance] s-g-c1-c2-c3-s-1*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG
>> {org.apache.stratos.autoscaler.applications.topic.ApplicationBuilder} -
>> Handling group terminated event: [group-id] s-g-c1-c2-c3-s-x0x
>> [application-id] s-g-c1-c2-c3-s [instance] s-g-c1-c2-c3-s-1*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG
>> {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -
>> ClusterMonitor Drools session has been disposed. ClusterMonitor
>> [clusterId=s-g-c1-c2-c3-s.c3-0x0.c3.domain, hasPrimary=false ]*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG
>> {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -
>> ClusterMonitor Drools session has been disposed. ClusterMonitor
>> [clusterId=s-g-c1-c2-c3-s.c1-0x0.c1.domain, hasPrimary=false ]*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG
>> {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -
>> ClusterMonitor Drools session has been disposed. ClusterMonitor
>> [clusterId=s-g-c1-c2-c3-s.c2-0x0.c2.domain, hasPrimary=false ]*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,529] DEBUG
>> {org.apache.stratos.cloud.controller.messaging.topology.TopologyManager} -
>> Write lock released*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,536] DEBUG
>> {org.apache.stratos.autoscaler.registry.RegistryManager} -  Application [
>> s-g-c1-c2-c3-s ] persisted successfully in the Autoscaler Registry*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,538] DEBUG
>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -
>> Applications updated:
>> {"applicationMap":{"s-g-c1-c2-c3-s":{"id":"s-g-c1-c2-c3-s","key":"l8V7OpRqOfBcWbBw","tenantId":-1234,"tenantDomain":"carbon.super","tenantAdminUserName":"admin","applicationPolicyId":"default-iaas","aliasToGroupMap":{"s-g-c1-c2-c3-s-x0x":{"name":"s-g-c1-c2-c3-s","alias":"s-g-c1-c2-c3-s-x0x","groupMinInstances":1,"groupMaxInstances":1,"applicationId":"s-g-c1-c2-c3-s","aliasToGroupMap":{},"aliasToClusterDataMap":{"c2-0x0":{"serviceType":"c2","clusterId":"s-g-c1-c2-c3-s.c2-0x0.c2.domain","minInstances":1,"maxInstances":1},"c3-0x0":{"serviceType":"c3","clusterId":"s-g-c1-c2-c3-s.c3-0x0.c3.domain","minInstances":1,"maxInstances":1},"c1-0x0":{"serviceType":"c1","clusterId":"s-g-c1-c2-c3-s.c1-0x0.c1.domain","minInstances":1,"maxInstances":1}},"typeToClusterDataMap":{"c1":{"serviceType":"c1","clusterId":"s-g-c1-c2-c3-s.c1-0x0.c1.domain","minInstances":1,"maxInstances":1},"c2":{"serviceType":"c2","clusterId":"s-g-c1-c2-c3-s.c2-0x0.c2.domain","minInstances":1,"maxInstances":1},"c3":{"serviceType":"c3","clusterId":"s-g-c1-c2-c3-s.c3-0x0.c3.domain","minInstances":1,"maxInstances":1}},"instanceIdToInstanceContextMap":{},"dependencyOrder":{"startupOrders":[{"startupOrderComponentList":["cartridge.c3-0x0","cartridge.c2-0x0"]},{"startupOrderComponentList":["cartridge.c2-0x0","cartridge.c1-0x0"]}],"terminationBehaviour":"terminate-none"},"isGroupScalingEnabled":false,"isGroupInstanceMonitoringEnabled":false,"instanceIdSequence":{"value":0}}},"aliasToClusterDataMap":{},"typeToClusterDataMap":{},"aliasToDeploymentPolicyIdMap":{"c3-0x0":"static-1","c2-0x0":"static-1","c1-0x0":"static-1"},"instanceIdToInstanceContextMap":{"s-g-c1-c2-c3-s-1":{"alias":"s-g-c1-c2-c3-s","instanceId":"s-g-c1-c2-c3-s-1","instanceProperties":{},"lifeCycleStateManager":{"stateStack":["Created","Active","Terminating"],"identifier":"s-g-c1-c2-c3-s_s-g-c1-c2-c3-s-1"},"networkPartitionId":"RegionOne"}},"dependencyOrder":{"startupOrders":[]},"isGroupScalingEnabled":false,"isGroupInstanceMonitoringEnabled":false,"instanceIdSequence":{"value":1}}},"initialized":false}*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,539]  INFO
>> {org.apache.stratos.autoscaler.applications.topic.ApplicationsEventPublisher}
>> -  Publishing group instance terminated event: [application] s-g-c1-c2-c3-s
>> [group] s-g-c1-c2-c3-s-x0x [instance] s-g-c1-c2-c3-s-1*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,545]  INFO
>> {org.apache.stratos.autoscaler.monitor.component.GroupMonitor} -  [Group]
>> s-g-c1-c2-c3-s-x0x is notifying the [parent] s-g-c1-c2-c3-s [instance]
>> s-g-c1-c2-c3-s-1*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,545] DEBUG
>> {org.apache.stratos.autoscaler.status.processor.group.*
>> *GroupStatusActiveProcessor}** -  GroupProcessor chain calculating the
>> status for the group [ s-g-c1-c2-c3-s ]*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG
>> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusActiveProcessor}
>> -  StatusChecker calculating the active status for the group [
>> s-g-c1-c2-c3-s ]  for the instance  [ s-g-c1-c2-c3-s-1 ]*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG
>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
>> lock acquired*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG
>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
>> lock released*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG
>> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor}
>> -  StatusChecker calculating the terminated status for the group [
>> s-g-c1-c2-c3-s ]  for the instance  [ s-g-c1-c2-c3-s-1 ]*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG
>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
>> lock acquired*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,546]  INFO
>> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor}
>> -  Sending application instance terminated for [application] s-g-c1-c2-c3-s
>> [instance] s-g-c1-c2-c3-s-1*
>>
>>
>>
>> Unsuccessful:
>>
>>
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,404] DEBUG
>> {org.apache.stratos.autoscaler.status.processor.group.*
>> *GroupStatusTerminatedProcessor**} -  StatusChecker calculating the
>> terminated status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [
>> s-g-c1-c2-c3-s-1 ]*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,404] DEBUG
>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
>> lock acquired*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,405] DEBUG
>> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusProcessor}
>> -  Checking the status of cluster s-g-c1-c2-c3-s.c1-0x0.c1.domain instance
>> status is: Terminating*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
>> {org.apache.stratos.cloud.controller.messaging.topology.TopologyManager} -
>> Write lock released*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
>> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusProcessor}
>> -  Checking the status of cluster s-g-c1-c2-c3-s.c1-0x0.c1.domain instance
>> status is: Terminating*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
>> lock released*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
>> {org.apache.stratos.autoscaler.status.processor.group.*
>> *GroupStatusTerminatingProcessor**} -  StatusChecker calculating the
>> terminating status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance
>> [ s-g-c1-c2-c3-s-1 ]*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
>> lock acquired*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
>> lock released*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
>> {org.apache.stratos.autoscaler.status.processor.group.*
>> *GroupStatusInactiveProcessor**} -  StatusChecker calculating the
>> inactive status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [
>> s-g-c1-c2-c3-s-1 ]*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG
>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
>> lock acquired*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG
>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
>> lock released*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,407]  WARN
>> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusInactiveProcessor}
>> -  **No possible state change found for* *[component] s-g-c1-c2-c3-s-x0x
>> [instance] s-g-c1-c2-c3-s-1*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG
>> {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -
>> ClusterMonitor Drools session has been disposed. ClusterMonitor
>> [clusterId=s-g-c1-c2-c3-s.c2-0x0.c2.domain, hasPrimary=false ]*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,481] ERROR
>> {org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor} -  System
>> error, lock has not released for 30 seconds: [lock-name] application
>> [lock-type] Write [thread-id] 99 [thread-name] pool-26-thread-2
>> [stack-trace] *
>>
>> *java.lang.Thread.getStackTrace(Thread.java:1589)*
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> [1.] Application Structure
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *From:* Martin Eppel (meppel)
>> *Sent:* Friday, June 05, 2015 4:38 PM
>>
>>
>> *To:* dev@stratos.apache.org
>> *Subject:* RE: Testing Stratos 4.1: Application undeployment:
>> application fails to undeploy (nested grouping, group scaling)
>>
>>
>>
>> This is another application, see [1.] which fails to get completely
>> removed:
>>
>>
>>
>> Scenario / Observation:
>>
>> ·        After all instances / application go active, one instance is
>> being terminated (to verify termination behavior). Once the terminated
>> instance is restored the application is undeployed.
>>
>> ·        After the Application undeployment process is started, all
>> instances are being terminated
>>
>> ·        Application still shows up in stratos admin, subsequent
>> deployments fail
>>
>>
>>
>> stratos> list-applications
>>
>> Applications found:
>>
>> +---------------------+---------------------+----------+
>>
>> | Application ID      | Alias               | Status   |
>>
>> +---------------------+---------------------+----------+
>>
>> | s-n-gr-s-G123-t-a-4 | s-n-gr-s-G123-t-a-4 | Deployed |
>>
>> +---------------------+---------------------+----------+
>>
>>
>>
>>
>>
>> [1.] Application:
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *From:* Martin Eppel (meppel)
>> *Sent:* Friday, June 05, 2015 3:26 PM
>> *To:* dev@stratos.apache.org
>> *Subject:* RE: Testing Stratos 4.1: Application undeployment:
>> application fails to undeploy (nested grouping, group scaling)
>>
>>
>>
>> After re-running it this my observations:
>>
>>
>>
>> ·        After the “Application undeployment process started” is
>> started, there is a likelihood that (a few) VMs are still launched – I
>> suspect this is due to some race condition between “Application
>> undeployment process started” and the “autoscaler”.
>>
>> ·        All Vms which were launched before the “Application
>> undeployment process started” get terminated as part of the undeployment
>> process.
>>
>> ·        Vms which were launched after “Application undeployment process
>> started” eventually get moved to obsolete / pending state and cleaned up,
>> this can take up to 15- 20 minutes.
>>
>> ·        The application never gets completely removed,
>>
>> ·        The following exception is consistently observed:
>>
>> ID: [0] [STRATOS] [2015-06-05 20:47:07,237]  WARN
>> {org.apache.stratos.common.concurrent.locks.ReadWriteLock} -  System
>> warning! Trying to release a lock which has not been taken by the same
>> thread: [lock-name] application-holder [thread-id] 114 [thread-name]
>> pool-24-thread-2
>>
>> *TID: [0] [STRATOS] [2015-06-05 20:47:07,237] ERROR
>> {org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator}
>> -  Failed to retrieve topology event message*
>>
>> *org.apache.stratos.common.exception.InvalidLockRequestedException:
>> System error, cannot acquire a write lock while having a read lock on the
>> same thread: [lock-name] application-holder [thread-id] 114 [thread-name]
>> pool-24-thread-2*
>>
>> *                    at
>> org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:114)*
>>
>> *                    at
>> org.apache.stratos.autoscaler.applications.ApplicationHolder.acquireWriteLock(ApplicationHolder.java:60)*
>>
>> ·        Initiating the “Application undeployment process” again will
>> cause the following INFO statement (without any further actions, see in log)
>> TID: [0] [STRATOS] [2015-06-05 21:34:34,509]  INFO
>> {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -
>> Application monitor is already in terminating, graceful un-deployment is
>> has already been attempted thus not invoking again
>>
>> ·        Other exceptions observed after the “Application undeployment
>> process started”
>>
>> TID: [0] [STRATOS] [2015-06-05 21:36:29,458] ERROR
>> {org.apache.stratos.autoscaler.rule.RuleTasksDelegator} -  Cannot terminate
>> instance
>>
>> org.apache.stratos.cloud.controller.stub.CloudControllerServiceInvalidMemberExceptionException:
>> CloudControllerServiceInvalidMemberExceptionException
>>
>>         at
>> sun.reflect.GeneratedConstructorAccessor219.newInstance(Unknown Source)
>>
>>         at
>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>
>>         at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>>
>>         at java.lang.Class.newInstance(Class.java:374)
>>
>>         at
>> org.apache.stratos.cloud.controller.stub.CloudControllerServiceStub.terminateInstance(CloudControllerServiceStub.java:8633)
>>
>>         at
>> org.apache.stratos.common.client.CloudControllerServiceClient.terminateInstance(CloudControllerServiceClient.java:120)
>>
>>         at
>> org.apache.stratos.autoscaler.rule.RuleTasksDelegator.terminateObsoleteInstance(RuleTasksDelegator.java:298)
>>
>>         at sun.reflect.GeneratedMethodAccessor413.invoke(Unknown Source)
>>
>>
>>
>> ·        Created a jira to track this issue:
>> https://issues.apache.org/jira/browse/STRATOS-1430
>>
>>
>>
>>
>>
>>
>>
>> Regards
>>
>>
>>
>> Martin
>>
>>
>>
>> Attached the log file of the last test
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *From:* Martin Eppel (meppel)
>> *Sent:* Friday, June 05, 2015 12:59 PM
>> *To:* dev@stratos.apache.org
>> *Subject:* RE: Testing Stratos 4.1: Application undeployment:
>> application fails to undeploy (nested grouping, group scaling)
>>
>>
>>
>> For this latest test I got the latest source from stratos repo so I have
>> this commit (see below), but the un-deployment still fails (to some extent).
>>
>> As mentioned below, it seems that all the members get terminated
>> eventually, including the ones which got started after the “application
>> un-deployment” process started.
>>
>> What is still left in stratos (even after all members got terminated) is
>> the application (see the stratos> list-applications command result below in
>> email thread). This would still be an issue when re-deploying the
>> application !
>>
>> I will do a few reruns to verify the removal of the VMs (members) is
>> consistent.
>>
>> Thanks
>>
>>
>>
>> Martin
>>
>>
>>
>> git show 2fe84b91843b20e91e8cafd06011f42d218f231c
>>
>> commit 2fe84b91843b20e91e8cafd06011f42d218f231c
>>
>> Author: anuruddhal <anuruddha789@gmail.com>
>>
>> Date:   Wed Jun 3 14:41:12 2015 +0530
>>
>>
>>
>> *From:* Imesh Gunaratne [mailto:imesh@apache.org <imesh@apache.org>]
>> *Sent:* Friday, June 05, 2015 12:46 PM
>> *To:* dev
>> *Subject:* Re: Testing Stratos 4.1: Application undeployment:
>> application fails to undeploy (nested grouping, group scaling)
>>
>>
>>
>> Hi Martin,
>>
>>
>>
>> I also encountered a similar issue with the application un-deployment
>> with PCA but I guess you are using JCA.
>>
>>
>>
>> I can see that Anuruddha has done a fix for the issue I'm referring with
>> the below commit:
>>
>>
>> https://github.com/apache/stratos/commit/2fe84b91843b20e91e8cafd06011f42d218f231c
>>
>>
>>
>> Regarding the member context not found error, this could occur if the
>> termination request was made for an already terminated member. There is a
>> possibility that Autoscaler make a second terminate request if the first
>> request take some time to execute and at the time the second request hit
>> Cloud Controller the member is already terminated with the first request.
>>
>>
>>
>> Can you please confirm whether the members were properly terminated and
>> its just this exceptions that you are seeing?
>>
>>
>>
>> Thanks
>>
>>
>>
>>
>>
>> On Sat, Jun 6, 2015 at 12:36 AM, Martin Eppel (meppel) <meppel@cisco.com>
>> wrote:
>>
>> Hi Udara,
>>
>>
>>
>> Picked up your commit and rerun the test case:
>>
>>
>>
>> Attached is the log file (artifacts are the same as before).
>>
>>
>>
>> *Didn’t see the issue with* “*Member is in the wrong list” …*
>>
>>
>>
>> but see the following exception after the undeploy application message:
>>
>> *TID: [0] [STRATOS] [2015-06-05 18:09:46,836] ERROR
>> {org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator}
>> -  Failed to retrieve topology event message*
>>
>> *org.apache.stratos.common.exception.InvalidLockRequestedException:
>> System error, cannot acquire a write lock while having a read lock on the
>> same thread: [lock-name] application-holder [thread-id] 114 [thread-name]
>> pool-24-thread-2*
>>
>> *                    at
>> org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:114)*
>>
>> *                    at
>> org.apache.stratos.autoscaler.applications.ApplicationHolder.acquireWriteLock(ApplicationHolder.java:60)*
>>
>>
>>
>>
>>
>> *Also, after the “Application undeployment process started” is started,
>> new members are being instantiated:*
>>
>>
>>
>> *TID: [0] [STRATOS] [2015-06-05 18:07:46,545]  INFO
>> {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher}
>> -  Publishing member created event*:
>>
>>
>>
>>
>>
>> *Eventually, these VMs get terminated :*
>>
>>
>>
>> *TID: [0] [STRATOS] [2015-06-05 18:42:42,413] ERROR
>> {org.apache.stratos.cloud.controller.services.impl.CloudControllerServiceImpl}
>> -  Could not terminate instance: [member-id]
>> g-sc-G12-1.c1-0x0.c1.domaindd9c1d40-70cc-4950-9757-418afe19ba7f*
>>
>> *org.apache.stratos.cloud.controller.exception.InvalidMemberException:
>> Could not terminate instance, member context not found: [member-id]
>> g-sc-G12-1.c1-0x0.c1.domaindd9c1d40-70cc-4950-9757-418afe19ba7f*
>>
>> *                    at
>> org.apache.stratos.cloud.controller.services.impl.CloudControllerServiceImpl.terminateInstance(CloudControllerServiceImpl.java:595)*
>>
>> *                    at
>> sun.reflect.GeneratedMethodAccessor408.invoke(Unknown Source)*
>>
>> *                    at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)*
>>
>> *                    at java.lang.reflect.Method.invoke(Method.java:606)*
>>
>>
>>
>>
>>
>> *but the application remains:*
>>
>>
>>
>> stratos> list-applications
>>
>> Applications found:
>>
>> +----------------+------------+----------+
>>
>> | Application ID | Alias      | Status   |
>>
>> +----------------+------------+----------+
>>
>> | g-sc-G12-1     | g-sc-G12-1 | Deployed |
>>
>> +----------------+------------+----------+
>>
>>
>>
>> ['g-sc-G12-1: applicationInstances 1, groupInstances 2, clusterInstances
>> 3, members 0 ()\n']
>>
>>
>>
>>
>>
>>
>>
>> *From:* Martin Eppel (meppel)
>> *Sent:* Friday, June 05, 2015 10:04 AM
>> *To:* dev@stratos.apache.org
>> *Subject:* RE: Testing Stratos 4.1: Application undeployment:
>> application fails to undeploy (nested grouping, group scaling)
>>
>>
>>
>> Ok:
>>
>>
>>
>> log4j.logger.org.apache.stratos.manager=DEBUG
>>
>> log4j.logger.org.apache.stratos.autoscaler=DEBUG
>>
>> log4j.logger.org.apache.stratos.messaging=INFO
>>
>> log4j.logger.org.apache.stratos.cloud.controller=DEBUG
>>
>> log4j.logger.org.wso2.andes.client=ERROR
>>
>> # Autoscaler rule logs
>>
>> log4j.logger.org.apache.stratos.autoscaler.rule.RuleLog=DEBUG
>>
>>
>>
>> *From:* Udara Liyanage [mailto:udara@wso2.com <udara@wso2.com>]
>> *Sent:* Friday, June 05, 2015 10:00 AM
>> *To:* dev
>> *Subject:* Re: Testing Stratos 4.1: Application undeployment:
>> application fails to undeploy (nested grouping, group scaling)
>>
>>
>>
>> Hi Martin,
>>
>>
>>
>> Better if you can enable debugs logs for all AS, CC and cartridge agent
>>
>>
>>
>> On Fri, Jun 5, 2015 at 10:23 PM, Udara Liyanage <udara@wso2.com> wrote:
>>
>> Hi,
>>
>>
>>
>> Please enable AS debug logs.
>>
>>
>>
>> On Fri, Jun 5, 2015 at 9:38 PM, Martin Eppel (meppel) <meppel@cisco.com>
>> wrote:
>>
>> Hi Udara,
>>
>>
>>
>> Yes, this issue seems to be fairly well reproducible, which debug log do
>> you want me to enable, cartridge agent logs ?
>>
>>
>>
>> Thanks
>>
>>
>>
>> Martin
>>
>>
>>
>> *From:* Udara Liyanage [mailto:udara@wso2.com]
>> *Sent:* Thursday, June 04, 2015 11:11 PM
>> *To:* dev
>> *Subject:* Re: Testing Stratos 4.1: Application undeployment:
>> application fails to undeploy (nested grouping, group scaling)
>>
>>
>>
>> Hi,
>>
>>
>>
>> This might be possible if AS did not receive member activated event
>> published by CC. Is it possible to enable debug logs if this is
>> reproducible.
>>
>> Or else I can add an INFO logs and commit.
>>
>>
>>
>>
>>
>> On Fri, Jun 5, 2015 at 9:11 AM, Udara Liyanage <udara@wso2.com> wrote:
>>
>> Hi,
>>
>>
>>
>>
>>
>> For the first issue you have mentioned, the particular member is
>> activated, but it is still identified as an obsolete member and is being
>> marked to be terminated since pending time expired. Does that mean member
>> is still in Obsolete list even though it is being activated?
>>
>>
>>
>> //member started
>>
>> TID: [0] [STRATOS] [2015-06-04 19:53:04,706]  INFO
>> {org.apache.stratos.autoscaler.context.cluster.ClusterContext} -  Member
>> stat context has been added: [application] g-sc-G12-1 [cluster]
>> g-sc-G12-1.c1-0x0.c1.domain [clusterInstanceContext] g-sc-G12-1-1
>> [partitionContext] whole-region [member-id]
>> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
>>
>>
>>
>> //member activated
>>
>> TID: [0] [STRATOS] [2015-06-04 19:56:00,907]  INFO
>> {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher}
>> -  Publishing member activated event: [service-name] c1 [cluster-id]
>> g-sc-G12-1.c1-0x0.c1.domain [cluster-instance-id] g-sc-G12-1-1 [member-id]
>> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
>> [network-partition-id] RegionOne [partition-id] whole-region
>>
>> TID: [0] [STRATOS] [2015-06-04 19:56:00,916]  INFO
>> {org.apache.stratos.messaging.message.processor.topology.MemberActivatedMessageProcessor}
>> -  Member activated: [service] c1 [cluster] g-sc-G12-1.c1-0x0.c1.domain
>> [member] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
>>
>>
>>
>> //after 15 minutes ---member is still in pending state, pending timeout
>> expired
>>
>> TID: [0] [STRATOS] [2015-06-04 20:08:04,713]  INFO
>> {org.apache.stratos.autoscaler.context.partition.ClusterLevelPartitionContext$PendingMemberWatcher}
>> -  Pending state of member expired, member will be moved to obsolete list.
>> [pending member]
>> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1 [expiry
>> time] 900000 [cluster] g-sc-G12-1.c1-0x0.c1.domain [cluster instance] null
>>
>>
>>
>> On Fri, Jun 5, 2015 at 5:14 AM, Martin Eppel (meppel) <meppel@cisco.com>
>> wrote:
>>
>> Hi,
>>
>>
>>
>> I am running into a scenario where application un-deployment fails (using
>> stratos with latest commit  b1b6bca3f99b6127da24c9af0a6b20faff2907be).
>>
>>
>>
>> For application structure see [1.], (debug enabled) wso2carbon.log,
>> application.json, cartridge-group.json, deployment-policy, auto-scaling
>> policies see attached zip file.
>>
>>
>>
>> *It is noteworthy, that while the application is running the following
>> log statements /exceptions are observed:*
>>
>>
>>
>> *…*
>>
>> *Member is in the wrong list and it is removed from active members list:
>> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1*
>>
>> *…*
>>
>> *TID: [0] [STRATOS] [2015-06-04 20:11:03,425] ERROR
>> {org.apache.stratos.autoscaler.rule.RuleTasksDelegator} -  Cannot terminate
>> instance*
>>
>> *…*
>>
>> *// **after receiving the application undeploy event:*
>>
>> *[2015-06-04 20:12:39,465]  INFO
>> {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -
>> Application undeployment process started: [application-id] g-sc-G12-1*
>>
>> *// **a new instance is being started up*
>>
>> *…*
>>
>> *[2015-06-04 20:13:13,445]  INFO
>> {org.apache.stratos.cloud.controller.services.impl.InstanceCreator} -
>> Instance started successfully: [cartridge-type] c2 [cluster-id]
>> g-sc-G12-1.c2-1x0.c2.domain [instance-id]
>> RegionOne/5d4699f7-b00b-42eb-b565-b48fc8f20407*
>>
>>
>>
>> *// Also noteworthy seems the following warning which is seen repeatedly
>> in the logs:*
>>
>> *ReadWriteLock} -  System warning! Trying to release a lock which has not
>> been taken by the same thread: [lock-name]*
>>
>>
>>
>>
>>
>> [1.] Application structure
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> ...
>>
>> [Message clipped]
>
>
>
>
> --
> *Lasindu Charith*
> Software Engineer, WSO2 Inc.
> Mobile: +94714427192
> Web: blog.lasindu.com
>



-- 
*Lasindu Charith*
Software Engineer, WSO2 Inc.
Mobile: +94714427192
Web: blog.lasindu.com

Mime
View raw message