cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohammad Rastgoo <moham...@synapti.ca>
Subject Re: Virtual Routers not starting up after host restart
Date Tue, 03 Feb 2015 20:23:51 GMT
top in cli comes back with qemu process but nothing starts.

Here is the agent pastebin log

http://pastebin.com/cf5KY52Q

On Mon, Feb 2, 2015 at 4:23 PM, Andrei Mikhailovsky <andrei@arhont.com>
wrote:

> Mohammad, any errors on the host side? Can you check if VRs are being
> created on the host? Also, check if you can get the console (from the
> hypervisor and not from the ACS GUI). Perhaps there is a clue on what's
> happening.
>
> By the way, are you other system vms working okay? Like ssvm and cpvm?
>
> Andrei
>
> ----- Original Message -----
>
> > From: "Mohammad Rastgoo" <mohammad@synapti.ca>
> > To: users@cloudstack.apache.org
> > Sent: Monday, 2 February, 2015 7:42:13 PM
> > Subject: Re: Virtual Routers not starting up after host restart
>
> > UP and green.
>
> > On Mon, Feb 2, 2015 at 2:34 PM, Andrei Mikhailovsky
> > <andrei@arhont.com>
> > wrote:
>
> > > From what I can see, the ACS is unable to contact your hypervisor
> > > host
> > > server:
> > >
> > >
> > > 2015-02-02 13:19:17,585 ERROR [c.c.v.VmWorkJobHandlerProxy]
> > > (Work-Job-Executor-24:ctx-114e980e job-244/job-245 ctx-405a7c78)
> > > Invocation
> > > exception, caused by:
> > > com.cloud.exception.AgentUnavailableException:
> > > Resource [Host:1] is unreachable: Host 1: Unable to start instance
> > > due to
> > > Unable to start VM[DomainRouter|r-29-VM] due to error in
> > > finalizeStart, not
> > > retrying
> > >
> > >
> > > What is the status of your host server? Is it shown as
> > > Up/Alert/Disconnected/Connecting?
> > >
> > > Andrei
> > >
> > > ----- Original Message -----
> > > > From: "Mohammad Rastgoo" <mohammad@synapti.ca>
> > > > To: users@cloudstack.apache.org
> > > > Sent: Monday, 2 February, 2015 7:27:30 PM
> > > > Subject: Re: Virtual Routers not starting up after host restart
> > > >
> > > > Andrei,
> > > >
> > > > Below is the partial MS log. I have marked couple parts in bold.
> > > > Might be
> > > > dumb but my first thought was maybe iptables is causing it, yet I
> > > > have no
> > > > good explanations for it.
> > > >
> > > > 2015-02-02 13:17:14,152 WARN [o.a.c.alerts]
> > > > (Work-Job-Executor-24:ctx-114e980e job-244/job-245 ctx-405a7c78)
> > > > alertType:: 9 // dataCenterId:: 1 // podId:: 1 // clusterId::
> > > > null //
> > > > message:: Command: com.cloud.agent.api.check.CheckSshCommand
> > > > failed while
> > > > starting virtual router
> > > > 2015-02-02 13:17:14,233 WARN
> > > [c.c.n.r.VirtualNetworkApplianceManagerImpl]
> > > > (Work-Job-Executor-24:ctx-114e980e job-244/job-245 ctx-405a7c78)
> > > > Command:
> > > > com.cloud.agent.api.check.CheckSshCommand failed while starting
> > > > virtual
> > > > router
> > > > 2015-02-02 13:17:49,620 WARN [o.a.c.f.j.i.AsyncJobMonitor]
> > > > (Timer-1:ctx-3041bbe4) Task (job-244) has been pending for 1134
> > > > seconds
> > > > 2015-02-02 13:17:49,620 WARN [o.a.c.f.j.i.AsyncJobMonitor]
> > > > (Timer-1:ctx-3041bbe4) Task (job-245) has been pending for 1133
> > > > seconds
> > > > 2015-02-02 13:18:49,620 WARN [o.a.c.f.j.i.AsyncJobMonitor]
> > > > (Timer-1:ctx-526a6af6) Task (job-244) has been pending for 1194
> > > > seconds
> > > > 2015-02-02 13:18:49,620 WARN [o.a.c.f.j.i.AsyncJobMonitor]
> > > > (Timer-1:ctx-526a6af6) Task (job-245) has been pending for 1193
> > > > seconds
> > > > 2015-02-02 13:19:16,969 ERROR [c.c.v.VirtualMachineManagerImpl]
> > > > (Work-Job-Executor-24:ctx-114e980e job-244/job-245 ctx-405a7c78)
> > > > Failed
> > > to
> > > > start instance VM[DomainRouter|r-29-VM]
> > > > com.cloud.utils.exception.ExecutionException: Unable to start
> > > > VM[DomainRouter|r-29-VM] due to error in finalizeStart, not
> > > > retrying
> > > > 2015-02-02 13:19:17,518 DEBUG [c.c.c.CapacityManagerImpl]
> > > > (Work-Job-Executor-24:ctx-114e980e job-244/job-245 ctx-405a7c78)
> > > > VM state
> > > > transitted from :Starting to Stopped with event:
> > > > OperationFailedvm's
> > > > original host id: null new host id: null host id before state
> > > transition: 1
> > > > 2015-02-02 13:19:17,585 ERROR [c.c.v.VmWorkJobHandlerProxy]
> > > > (Work-Job-Executor-24:ctx-114e980e job-244/job-245 ctx-405a7c78)
> > > Invocation
> > > > exception, caused by:
> > > > com.cloud.exception.AgentUnavailableException:
> > > > Resource [Host:1] is unreachable: Host 1: Unable to start
> > > > instance due to
> > > > Unable to start VM[DomainRouter|r-29-VM] due to error in
> > > > finalizeStart,
> > > not
> > > > retrying
> > > > 2015-02-02 13:19:17,585 INFO [c.c.v.VmWorkJobHandlerProxy]
> > > > (Work-Job-Executor-24:ctx-114e980e job-244/job-245 ctx-405a7c78)
> > > > Rethrow
> > > > exception com.cloud.exception.AgentUnavailableException: Resource
> > > [Host:1]
> > > > is unreachable: Host 1: Unable to start instance due to Unable to
> > > > start
> > > > VM[DomainRouter|r-29-VM] due to error in finalizeStart, not
> > > > retrying
> > > > 2015-02-02 13:19:17,586 ERROR [c.c.v.VmWorkJobDispatcher]
> > > > (Work-Job-Executor-24:ctx-114e980e job-244/job-245) Unable to
> > > > complete
> > > > AsyncJobVO {id:245, userId: 2, accountId: 2, instanceType: null,
> > > > instanceId: null, cmd: com.cloud.vm.VmWorkStart, cmdInfo:
> > > >
> > >
> rO0ABXNyABhjb20uY2xvdWQudm0uVm1Xb3JrU3RhcnR9cMGsvxz73gIAC0oABGRjSWRMAAZhdm9pZHN0ADBMY29tL2Nsb3VkL2RlcGxveS9EZXBsb3ltZW50UGxhbm5lciRFeGNsdWRlTGlzdDtMAAljbHVzdGVySWR0ABBMamF2YS9sYW5nL0xvbmc7TAAGaG9zdElkcQB-AAJMAAtqb3VybmFsTmFtZXQAEkxqYXZhL2xhbmcvU3RyaW5nO0wAEXBoeXNpY2FsTmV0d29ya0lkcQB-AAJMAAdwbGFubmVycQB-AANMAAVwb2RJZHEAfgACTAAGcG9vbElkcQB-AAJMAAlyYXdQYXJhbXN0AA9MamF2YS91dGlsL01hcDtMAA1yZXNlcnZhdGlvbklkcQB-AAN4cgATY29tLmNsb3VkLnZtLlZtV29ya5-ZtlbwJWdrAgAESgAJYWNjb3VudElkSgAGdXNlcklkSgAEdm1JZEwAC2hhbmRsZXJOYW1lcQB-AAN4cAAAAAAAAAACAAAAAAAAAAIAAAAAAAAAHXQAGVZpcnR1YWxNYWNoaW5lTWFuYWdlckltcGwAAAAAAAAAAHBwcHBwcHBwc3IAEWphdmEudXRpbC5IYXNoTWFwBQfawcMWYNEDAAJGAApsb2FkRmFjdG9ySQAJdGhyZXNob2xkeHA_QAAAAAAADHcIAAAAEAAAAAF0AA5SZXN0YXJ0TmV0d29ya3QAP3JPMEFCWE55QUJGcVlYWmhMbXhoYm1jdVFtOXZiR1ZoYnMwZ2NvRFZuUHJ1QWdBQldnQUZkbUZzZFdWNGNBRXhw,
> > > > cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode:
> > > > 0,
> > > > result: null, initMsid: 161333667508, completeMsid: null,
> > > > lastUpdated:
> > > > null, lastPolled: null, created: Mon Feb 02 12:58:55 EST 2015},
> > > > job
> > > > origin:244
> > > >
> > > >
> > > > *com.cloud.exception.AgentUnavailableException: Resource [Host:1]
> > > > is
> > > > unreachable: Host 1: Unable to start instance due to Unable to
> > > > start
> > > > VM[DomainRouter|r-29-VM] due to error in finalizeStart, not
> > > retryingCaused
> > > > by: com.cloud.utils.exception.ExecutionException: Unable to start
> > > > VM[DomainRouter|r-29-VM] due to error in finalizeStart, not
> > > > retrying*
> > > >
> > > > 2015-02-02 13:19:17,930 WARN [o.a.c.e.o.NetworkOrchestrator]
> > > > (API-Job-Executor-16:ctx-0dfa85ec job-244 ctx-2d8a3616) Failed to
> > > implement
> > > > network Ntwk[f3a318a2-d6f0-4fcb-be94-4e4586cc20a3|Guest|7]
> > > > elements and
> > > > resources as a part of network restart due to
> > > > java.lang.RuntimeException: *Job failed due to exception Resource
> > > [Host:1]
> > > > is unreachable: Host 1: Unable to start instance due to Unable to
> > > > start
> > > > VM[DomainRouter|r-29-VM] due to error in finalizeStart, not
> > > > retrying*
> > > > 2015-02-02 13:19:17,930 WARN [c.c.n.NetworkServiceImpl]
> > > > (API-Job-Executor-16:ctx-0dfa85ec job-244 ctx-2d8a3616) Network
> > > > id=207
> > > > failed to restart.
> > > > 2015-02-02 13:19:18,135 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
> > > > (API-Job-Executor-16:ctx-0dfa85ec job-244) Complete async
> > > > job-244,
> > > > jobStatus: FAILED, resultCode: 530, result:
> > > >
> > >
> org.apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[],"errorcode":530,"errortext":"Failed
> > > > to restart network"}
> > > > 2015-02-02 13:23:18,345 WARN
> > > > [c.c.a.d.ParamGenericValidationWorker]
> > > > (catalina-exec-19:ctx-b153f3e0 ctx-d5075366) Received unknown
> > > > parameters
> > > > for command listNetworks. Unknown parameters : details
> > > >
> > > > On Mon, Feb 2, 2015 at 2:19 PM, Andrei Mikhailovsky
> > > > <andrei@arhont.com>
> > > > wrote:
> > > >
> > > > > Mohammad, what does the management server log say when you try
> > > > > to start
> > > > > VRs? It should have the clue why it is not starting
> > > > >
> > > > > Andrei
> > > > >
> > > > > ----- Original Message -----
> > > > >
> > > > > > From: "Mohammad Rastgoo" <mohammad@synapti.ca>
> > > > > > To: users@cloudstack.apache.org
> > > > > > Sent: Monday, 2 February, 2015 6:06:41 PM
> > > > > > Subject: Virtual Routers not starting up after host restart
> > > > >
> > > > > > Hi,
> > > > >
> > > > > > Thanks for reading this.
> > > > >
> > > > > > I have this setup:
> > > > > > server 1: MS + DB
> > > > > > server 2: secondary storage NFS
> > > > > > server 3: kvm - local primary
> > > > > > (all centos 6.6)
> > > > > > net1: isolated network 10.0.0.0/x
> > > > > > net2: shared network (public ip)
> > > > >
> > > > > > Here are the steps I took:
> > > > >
> > > > > > 1- stopped all VMs
> > > > > > 2- stopped system VMs (not VRs)
> > > > > > 3- yum updated glibc + reboot on all servers
> > > > >
> > > > > > Now here is the situation, net2 has remained in setup state
> > > > > > and net1
> > > > > > on
> > > > > > allocated.
> > > > >
> > > > > > sys VMs are back on. VRs are at starting and then stopped.
> > > > >
> > > > > > so far, I have deleted VRs and restarted networks + clean up.
> > > > > > no
> > > > > > luck.
> > > > >
> > > > > > has anyone encountered the same problem? am I missing
> > > > > > anything here?
> > > > >
> > > > > > Any help is highly appreciated. Tnx
> > > > >
> > > > > > --
> > > > > > Mohammad Rastgoo
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Mohammad Rastgoo
> > > > Founder & CEO
> > > >
> > >
>
> > --
> > Mohammad Rastgoo
> > Founder & CEO
>



-- 
Mohammad Rastgoo
Founder & CEO

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message