stratos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vanson Lim (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (STRATOS-1282) Stratos4.1.0 - error cleaning up VMs (that have floatingip) terminated through Openstack horizon
Date Fri, 27 Mar 2015 18:14:54 GMT

    [ https://issues.apache.org/jira/browse/STRATOS-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384279#comment-14384279
] 

Vanson Lim commented on STRATOS-1282:
-------------------------------------


Udara,

Thanks for the fix.  I've verified on my setup that I no longer see the traceback, and this
test cases seems to be
behaving properly now.

I took a look at the diffs associated with this commit and have some minor comments.

-Vanson





> Stratos4.1.0 - error cleaning up VMs (that have floatingip) terminated through Openstack
horizon
> ------------------------------------------------------------------------------------------------
>
>                 Key: STRATOS-1282
>                 URL: https://issues.apache.org/jira/browse/STRATOS-1282
>             Project: Stratos
>          Issue Type: Bug
>          Components: Cloud Controller
>    Affects Versions: 4.1.0 Beta
>            Reporter: Martin Eppel
>            Priority: Blocker
>
> On 3/23/15, 6:11 AM, Udara Liyanage wrote:
> Hi, 
> I could reproduce this in Openstack. The region and image id of the iaasProvider is null
at the time of IP releasing. When I set the region in cloud-controller.xml (which is not a
solution,  just for testing) it works without the issue.
> [2015-03-23 15:25:23,067]  INFO {org.apache.stratos.cloud.controller.iaases.JcloudsIaas}
-  Member terminated: [member-id] single-cartridge-app.my-php.php.domaine4fb4a32-64b1-4804-877f-2e93748f6a06
> [2015-03-23 15:25:23,076]  INFO {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher}
-  Publishing member terminated event: [service-name] php [cluster-id] single-cartridge-app.my-php.php.domain
[cluster-instance-id] single-cartridge-app-1 [member-id] single-cartridge-app.my-php.php.domaine4fb4a32-64b1-4804-877f-2e93748f6a06
[network-partition-id] network-partition-1 [partition-id] partition-1 [group-id] null
> [2015-03-23 15:25:23,084]  INFO {org.apache.
> Udara,
> Thanks for looking at this.
> I've confirmed that adding the following to the cloud-controller iaasProvider also seems
to cover up the problem,  I agree, clearly not a solution.
> @@ -13,4 +13,5 @@
>          <property name="openstack.networking.provider" value="nova" />
>         <property name="X" value="x" />
>         <property name="Y" value="y" />
> +       <property name="region" value="RegionOne" />
>  </iaasProvider>
> We'll fill a bug to track this.
> There's also the matter that after stratos detects that the VM is inactive, (as shown
in log snippet below at 18.57:51),  the VM continues to be reported as "ACTIVE" in the topology

> events until it's terminated at 18:59:05.    Is there logic in place that will return
this VM to service if the VM is detected before the CEP publishes member fault event?
> TID: [0] [STRATOS] [2015-03-23 18:57:51,932]  WARN {org.apache.stratos.autoscaler.status.processor.group.GroupStatusInactiveProcessor}
-  Sending application instance inactive for [Application] cisco-sample-vm [ApplicationInstance]
cisco-sample-vm-1
> TID: [0] [STRATOS] [2015-03-23 18:57:51,941]  INFO {org.apache.stratos.autoscaler.applications.topic.ApplicationsEventPublisher}
-  Publishing application inactivated event: [application] cisco-sample-vm [instance] cisco-sample-vm-1
> TID: [0] [STRATOS] [2015-03-23 18:58:51,883]  INFO {org.apache.stratos.cep.extension.FaultHandlingWindowProcessor}
-  Faulty member detected [member-id] cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain4e32a138-4c48-46e0-a7aa-2949cd841965
with [last time-stamp] 1427136970708 [time-out] 60000 milliseconds
> TID: [0] [STRATOS] [2015-03-23 18:58:51,884]  INFO {org.apache.stratos.cep.extension.FaultHandlingWindowProcessor}
-  Publishing member fault event for [member-id] cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain4e32a138-4c48-46e0-a7aa-2949cd841965
> .....
> TID: [0] [STRATOS] [2015-03-23 18:59:05,887]  INFO {org.apache.stratos.common.client.CloudControllerServiceClient}
-  Terminating instance via cloud controller: [member] cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain4e32a138-4c48-46e0-a7aa-2949cd841965
> -Vanson
> On Mon, Mar 23, 2015 at 11:07 AM, Udara Liyanage <udara@wso2.com> wrote:
> Hi,  
> I will have a look.
> On Mon, Mar 23, 2015 at 3:38 AM, Vanson Lim <vlim@cisco.com> wrote:
> Devs,
> We are continuing to work on testing the latest stratos 4.1.0 codebase.
> This problem is seen only for  VM that have floating ip.   I've tested with the non floating
ip case and don't see issues.
> The error return code from jcloud api call is preventing stratos from cleaning up its
state.
> Stratos seems to forever throw tracebacks as it repeatedly tries to terminate the faulty
instance.
> Meanwhile, the "down" VM is still being reported as active in the topology events, which
seems wrong.  If stratos detects that the VM is faulty, shouldn't it report it immediately
in the topology events?  Stratos currently has the following states define and none of them
seem to be appropriate.
> Created
> Initialized
> Starting
> Active
> In_Maintenance
> ReadyToShutdown
> Suspended
> Terminated
> Do we need new state TIMED-OUT state that stratos reports for VM as stratos works to
terminate it?
> How to reproduce this issue:
> 1) Start a sample cartridge instance that has a floating ip.
> 2) wait for sample cartridge to become active
> 3) terminate sample vm via openstack horizon interface, and wait for stratos to detect
VM the error.
> Testing using a version of stratos built off the following commit id:
> commit 01dd9e491ad3acf7cc4e0f2895aaba336b82539d
> Author: R-Rajkumar <rraju1990@gmail.com>
> Date:   Fri Mar 20 19:51:06 2015 +0530
>     fixing an NPE in AS
> I've attached the full wso2carbon.log  Included below is the observed traceback:
> -Vanson
> TID: [0] [STRATOS] [2015-03-22 20:53:21,554]  INFO {org.apache.stratos.cep.extension.FaultHandlingWindowProcessor}
-  Publishing member fault event for [member-id] cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain85d6eda0-1df5-4be2-b846-4817cc5292cd
> TID: [0] [STRATOS] [2015-03-22 20:54:06,386]  INFO {org.apache.stratos.common.client.CloudControllerServiceClient}
-  Terminating instance via cloud controller: [member] cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain85d6eda0-1df5-4be2-b846-4817cc5292cd
> TID: [0] [STRATOS] [2015-03-22 20:54:06,399]  INFO {org.apache.stratos.cloud.controller.iaases.JcloudsIaas}
-  Starting to terminate member: [cartridge-type] cisco-sample-vm [member-id] cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain85d6eda0-1df5-4be2-b846-4817cc5292cd
> TID: [0] [STRATOS] [2015-03-22 20:54:06,450] ERROR {org.apache.stratos.cloud.controller.services.impl.InstanceTerminator}
-  Instance termination failed! MemberContext [applicationId=cisco-sample-vm, cartridgeType=cisco-sample-vm,
clusterId=cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain, memberId=cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain85d6eda0-1df5-4be2-b846-4817cc5292cd,
instanceId=RegionOne/83751110-4e5b-4aef-b6a3-c291c9eaad3d, partition=Partition [id=whole-region,
description=null, isPublic=false, provider=Core, properties=Properties [properties=[Property
[name=region, value=RegionOne]]]], defaultPrivateIP=172.16.2.17, defaultPublicIP=10.0.0.102,
allocatedIPs=[10.0.0.102], publicIPs=[10.0.0.102], privateIPs=[172.16.2.17], initTime=1427057106433,
lbClusterId=null, networkPartitionId=RegionOne, kubernetesPodId=null, kubernetesPodLabel=null,
loadBalancingIPType=Private, instanceMetadata=org.apache.stratos.cloud.controller.domain.InstanceMetadata@5b176e44,
properties=Properties [properties=[Property [name=PRIMARY, value=false], Property [name=MIN_COUNT,
value=1]]]]
> java.lang.NullPointerException: arg[0] in {invocation=org.jclouds.openstack.nova.v2_0.NovaApi.public
abstract com.google.common.base.Optional org.jclouds.openstack.nova.v2_0.NovaApi.getFloatingIPExtensionForZone(java.lang.String)[null],result={annotationParser={caller=NovaApi.getFloatingIPExtensionForZone[null]}}}
>         at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:253)
>         at org.jclouds.openstack.v2_0.functions.PresentWhenExtensionAnnotationNamespaceEqualsAnyNamespaceInExtensionsSet.apply(PresentWhenExtensionAnnotationNamespaceEqualsAnyNamespaceInExtensionsSet.java:67)
>         at org.jclouds.openstack.v2_0.functions.PresentWhenExtensionAnnotationNamespaceEqualsAnyNamespaceInExtensionsSet.apply(PresentWhenExtensionAnnotationNamespaceEqualsAnyNamespaceInExtensionsSet.java:43)
>         at org.jclouds.rest.internal.DelegatesToInvocationFunction.propagateContextToDelegate(DelegatesToInvocationFunction.java:205)
>         at org.jclouds.rest.internal.DelegatesToInvocationFunction.handle(DelegatesToInvocationFunction.java:154)
>         at org.jclouds.rest.internal.DelegatesToInvocationFunction.invoke(DelegatesToInvocationFunction.java:123)
>         at com.sun.proxy.$Proxy119.getFloatingIPExtensionForZone(Unknown Source)
>         at org.apache.stratos.cloud.controller.iaases.openstack.networking.NovaNetworkingApi.releaseAddress(NovaNetworkingApi.java:239)
>         at org.apache.stratos.cloud.controller.iaases.openstack.OpenstackIaas.releaseAddress(OpenstackIaas.java:239)
>         at org.apache.stratos.cloud.controller.iaases.JcloudsIaas.destroyNode(JcloudsIaas.java:334)
>         at org.apache.stratos.cloud.controller.iaases.JcloudsIaas.terminateInstance(JcloudsIaas.java:314)
>         at org.apache.stratos.cloud.controller.services.impl.InstanceTerminator.run(InstanceTerminator.java:56)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> TID: [0] [STRATOS] [2015-03-22 20:54:21,563]  INFO {org.apache.stratos.cep.extension.FaultHandlingWindowProcessor}
-  Faulty member detected [member-id] cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain85d6eda0-1df5-4be2-b846-4817cc5292cd
with [last time-stamp] 1427057336960 [time-out] 60000 milliseconds
> TID: [0] [STRATOS] [2015-03-22 20:54:21,563]  INFO {org.apache.stratos.cep.extension.FaultHandlingWindowProcessor}
-  Publishing member fault event for [member-id] cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain85d6eda0-1df5-4be2-b846-4817cc5292cd
> -- 
> Udara Liyanage 
> Software Engineer
> WSO2, Inc.: http://wso2.com 
> lean. enterprise. middleware
> web: http://udaraliyanage.wordpress.com
> phone: +94 71 443 6897
> -- 
> Udara Liyanage 
> Software Engineer
> WSO2, Inc.: http://wso2.com 
> lean. enterprise. middleware
> web: http://udaraliyanage.wordpress.com
> phone: +94 71 443 6897



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message