cloudstack-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dsclose (JIRA)" <j...@apache.org>
Subject [jira] [Closed] (CLOUDSTACK-9024) Restart network fails if redundant router is missing
Date Thu, 17 Mar 2016 14:25:33 GMT

     [ https://issues.apache.org/jira/browse/CLOUDSTACK-9024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

dsclose closed CLOUDSTACK-9024.
-------------------------------
       Resolution: Fixed
    Fix Version/s: 4.8.0

After upgrading from Cloudstack 4.5.2 to Cloudstack 4.8.0 the restart network behaviour is
now correct.

Closing this issue.

> Restart network fails if redundant router is missing
> ----------------------------------------------------
>
>                 Key: CLOUDSTACK-9024
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9024
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the default.) 
>          Components: API, Network Controller, Virtual Router
>    Affects Versions: 4.5.2
>         Environment: Cloudstack 4.5.2, previously updated from Cloudstack 4.3, installed
on CentOS 6.5
>            Reporter: dsclose
>             Fix For: 4.8.0
>
>
> Edit: Included details discussed in comments below.
> Restart network action fails if a network is missing a redundant virtual router. This
occurs if triggered via the UI (Networks -> Select Network -> Restart -> Clean-ip:
False -> OK) or via the API.
> Steps to reproduce:
> ------------------------
> 1. Create a redundant router network offering.
> 2. Create a network using the redundant router network offering.
> 3. Destroy a redundant router from the network. Leave one functioning.
> 4. Initiate the restart network action or restartNetwork API call with clean-up set to
False.
> Expected Behaviour:
> --------------------------
> Cloudstack should boot a new redundant virtual router to replace the missing router.
The Network Restart action should return successfully. This is consistent with the Cloudstack
documentation on how to replace faulty virtual routers: "If you are sure that a virtual router
is down forever, or no longer functions as expected, destroy it. ... Recreate the missing
router by using the restartNetwork API with cleanup=false parameter."
> http://cloudstack-administration.readthedocs.org/en/latest/troubleshooting.html#recovering-a-lost-virtual-router
> Observed Behaviour
> --------------------------
> Cloudstack boots a replacement redundant router as expected. The API call, however, returns
a "Network Restart Failed" result. The same is true if the restart is triggered from the Web
UI. The Cloudstack logs report "Can't find all necessary running routers!"
> Hypothesis
> --------------
> This appears to have been introduced as part of CLOUDSTACK-6433. Relevant commit: https://git-wip-us.apache.org/repos/asf?p=cloudstack.git;h=59a9db3
Commit title: Don't return success if only one RvR builds successfully.
> The code included in that commit throws an exception "Can't find all necessary running
routers!" if a network implementation action doesn't create two virtual routers.
> Possibly related issues
> -----------------------------
> This may be related to issues Cloudstack-8844 and Cloudstack-8787. I don't have a proper
environment to test whether Cloudstack-8844 has resolved this issue as well.
> Timeline:
> -----------
> 2015-11-03 17:12:08,256 Destroying router "r-985-VM".
> 2015-11-03 17:12:24,511 Performing network restart.
> 2015-11-03 17:14:02,851 Failed to restart network
> Management Log Sample
> ---------------------------------
> 2015-11-03 17:12:14,943 INFO  [o.a.c.f.j.i.AsyncJobMonitor] (Work-Job-Executor-12:ctx-a671c200
job-163/job-164) Remove job-164 from job monitoring
> 2015-11-03 17:12:15,739 INFO  [o.a.c.s.v.VolumeServiceImpl] (API-Job-Executor-12:ctx-33b24483
job-163 ctx-4d95a357) Volume 985 is not referred anywhere, remove it from volumes table
> 2015-11-03 17:12:15,850 INFO  [o.a.c.f.j.i.AsyncJobMonitor] (API-Job-Executor-12:ctx-33b24483
job-163) Remove job-163 from job monitoring
> 2015-11-03 17:12:18,698 INFO  [o.a.c.f.j.i.AsyncJobMonitor] (API-Job-Executor-13:ctx-c29ad7f0
job-165) Add job-165 into job monitoring
> 2015-11-03 17:12:18,985 INFO  [c.c.n.r.VirtualNetworkApplianceManagerImpl] (API-Job-Executor-13:ctx-c29ad7f0
job-165 ctx-7945f6f9) Use same MAC as previous RvR, the MAC is 06:9c:86:00:00:0e
> 2015-11-03 17:12:19,829 INFO  [o.a.c.f.j.i.AsyncJobMonitor] (Work-Job-Executor-13:ctx-06672650
job-165/job-166) Add job-166 into job monitoring
> 2015-11-03 17:12:20,078 INFO  [c.c.s.StorageManagerImpl] (Work-Job-Executor-13:ctx-06672650
job-165/job-166 ctx-81c163bb) Storage pool null (1) does not supply IOPS capacity, assuming
enough capacity
> 2015-11-03 17:12:40,248 INFO  [c.c.v.VirtualMachineManagerImpl] (DirectAgentCronJob-492:ctx-1fb6ecea)
There is pending job or HA tasks working on the VM. vm id: 992, postpone power-change report
by resetting power-change counters
> 2015-11-03 17:13:40,384 INFO  [c.c.v.VirtualMachineManagerImpl] (DirectAgentCronJob-280:ctx-846ef4f0)
There is pending job or HA tasks working on the VM. vm id: 992, postpone power-change report
by resetting power-change counters
> 2015-11-03 17:13:49,799 INFO  [o.a.c.f.j.i.AsyncJobManagerImpl] (AsyncJobMgr-Heartbeat-1:ctx-10ff7f5f)
Begin cleanup expired async-jobs
> 2015-11-03 17:13:49,825 INFO  [o.a.c.f.j.i.AsyncJobManagerImpl] (AsyncJobMgr-Heartbeat-1:ctx-10ff7f5f)
End cleanup expired async-jobs
> 2015-11-03 17:13:55,688 INFO  [o.a.c.f.j.i.AsyncJobMonitor] (Work-Job-Executor-13:ctx-06672650
job-165/job-166) Remove job-166 from job monitoring
> 2015-11-03 17:13:55,730 WARN  [o.a.c.e.o.NetworkOrchestrator] (API-Job-Executor-13:ctx-c29ad7f0
job-165 ctx-7945f6f9) Failed to implement network Ntwk[208|Guest|15] elements and resources
as a part of network restart due to
> com.cloud.exception.ResourceUnavailableException: Resource [DataCenter:1] is unreachable:
Can't find all necessary running routers!
>         at com.cloud.network.element.VirtualRouterElement.implement(VirtualRouterElement.java:202)
>         at org.apache.cloudstack.engine.orchestration.NetworkOrchestrator.implementNetworkElementsAndResources(NetworkOrchestrator.java:1103)
>         at org.apache.cloudstack.engine.orchestration.NetworkOrchestrator.restartNetwork(NetworkOrchestrator.java:2546)
>         at com.cloud.network.NetworkServiceImpl.restartNetwork(NetworkServiceImpl.java:1891)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
>         at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
>         at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
>         at org.apache.cloudstack.network.contrail.management.EventUtils$EventInterceptor.invoke(EventUtils.java:106)
>         at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161)
>         at com.cloud.event.ActionEventInterceptor.invoke(ActionEventInterceptor.java:51)
>         at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161)
>         at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
>         at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
>         at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
>         at com.sun.proxy.$Proxy157.restartNetwork(Unknown Source)
>         at org.apache.cloudstack.api.command.user.network.RestartNetworkCmd.execute(RestartNetworkCmd.java:95)
>         at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:141)
>         at com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:108)
>         at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:537)
>         at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
>         at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
>         at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
>         at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
>         at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
>         at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:494)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 2015-11-03 17:13:55,732 WARN  [c.c.n.NetworkServiceImpl] (API-Job-Executor-13:ctx-c29ad7f0
job-165 ctx-7945f6f9) Network id=208 failed to restart.
> 2015-11-03 17:13:55,806 INFO  [o.a.c.f.j.i.AsyncJobMonitor] (API-Job-Executor-13:ctx-c29ad7f0
job-165) Remove job-165 from job monitoring
> 2015-11-03 17:14:00,988 INFO  [c.c.n.r.VirtualNetworkApplianceManagerImpl] (RedundantRouterStatusMonitor-7:ctx-5a3246e2)
Redundant virtual router (name: r-992-VM, id: 992)  just switch from UNKNOWN to BACKUP



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message