cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrija Panic <andrija.pa...@shapeblue.com>
Subject RE: Disaster after maintenance
Date Tue, 19 Mar 2019 14:19:27 GMT
​​
Your network can't be deleted due to "Can't delete the network, not all user vms are expunged.
Vm
VM[User|i-2-11-VM] is in Stopped state" - which is fine.

You should be able to just start the user VM - but if you have actually delete the VR itself,
then just do Network restart with "cleanup" and it will recreate a new VR, after which you
should be able to start the VM.

Andrija

andrija.panic@shapeblue.com 
www.shapeblue.com
Amadeus House, Floral Street, London  WC2E 9DPUK
@shapeblue
  
 


-----Original Message-----
From: Jevgeni Zolotarjov <j.zolotarjov@gmail.com> 
Sent: 19 March 2019 15:10
To: users@cloudstack.apache.org
Subject: Re: Disaster after maintenance

I mean I cannot delete network: In the management server log I see

==========================================================
019-03-19 14:06:36,316 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-1:ctx-1c0fd4dc
job-5090) (logid:c734edfc) Executing AsyncJobVO {id:5090, userId: 2, accountId: 2, instanceType:
Network,
instanceId: 204, cmd:
org.apache.cloudstack.api.command.user.network.DeleteNetworkCmd, cmdInfo:
{"response":"json","ctxUserId":"2","httpmethod":"GET","ctxStartEventId":"2641","id":"4ba834ed-48f3-468f-b667-9bb2d2c258f1","ctxDetails":"{\"interface
com.cloud.network.Network\":\"4ba834ed-48f3-468f-b667-9bb2d2c258f1\"}","ctxAccountId":"2","uuid":"4ba834ed-48f3-468f-b667-9bb2d2c258f1","cmdEventType":"NETWORK.DELETE","_":"1553004396247"},
cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0,
result: null, initMsid: 264216221068220, completeMsid: null, lastUpdated:
null, lastPolled: null, created: null}
2019-03-19 14:06:36,351 WARN  [o.a.c.e.o.NetworkOrchestrator] (API-Job-Executor-1:ctx-1c0fd4dc
job-5090 ctx-134954fa) (logid:c734edfc) Can't delete the network, not all user vms are expunged.
Vm VM[User|i-2-11-VM] is in Stopped state
2019-03-19 14:06:36,356 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-1:ctx-1c0fd4dc
job-5090) (logid:c734edfc) Complete async job-5090, jobStatus: FAILED, resultCode: 530, result:
org.apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[],"errorcode":530,"errortext":"Failed
to delete network"}
==========================================================


I deleted a router, expecting it to be recreated on deleting network. But I am unable to delete
network because of above error

On Tue, Mar 19, 2019 at 3:58 PM Jevgeni Zolotarjov <j.zolotarjov@gmail.com>
wrote:

> I've managed to make libvirtd running
> Now cloudstack console shows both hosts - running
>
> But now as I have removed network, VMs are unable to start.
>
> How can I recreate the network now?
>
> On Tue, Mar 19, 2019 at 3:14 PM Ivan Kudryavtsev 
> <kudryavtsev_ia@bw-sw.com>
> wrote:
>
>> Jevgeniy, it may be a documentation bug. Take s look:
>> https://github.com/apache/cloudstack-documentation/pull/27/files
>>
>> вт, 19 мар. 2019 г., 9:09 Jevgeni Zolotarjov <j.zolotarjov@gmail.com>:
>>
>> > That's it - libvirtd failed to start on second host.
>> > Tried restarting, but it does not start.
>> >
>> >
>> > >> Do you have some NUMA constraints or anything which requires
>> particular
>> > RAM configuration?
>> > No
>> >
>> >  libvirtd.service - Virtualization daemon
>> >    Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; 
>> > enabled; vendor preset: enabled)
>> >    Active: failed (Result: start-limit) since Tue 2019-03-19 
>> > 13:03:07
>> GMT;
>> > 12s ago
>> >      Docs: man:libvirtd(8)
>> >            https://libvirt.org
>> >   Process: 892 ExecStart=/usr/sbin/libvirtd $LIBVIRTD_ARGS 
>> > (code=exited,
>> > status=1/FAILURE)
>> >  Main PID: 892 (code=exited, status=1/FAILURE)
>> >     Tasks: 19 (limit: 32768)
>> >    CGroup: /system.slice/libvirtd.service
>> >            ├─11338 /usr/sbin/libvirtd -d -l
>> >            ├─11909 /usr/sbin/dnsmasq 
>> > --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro 
>> > --dhcp-script=/usr/libexec/libvirt_leaseshelper
>> >            └─11910 /usr/sbin/dnsmasq 
>> > --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro 
>> > --dhcp-script=/usr/libexec/libvirt_leaseshelper
>> >
>> > Mar 19 13:03:07 mtl1-apphst04.mt.pbt.com.mt systemd[1]: Failed to 
>> > start Virtualization daemon.
>> > Mar 19 13:03:07 mtl1-apphst04.mt.pbt.com.mt systemd[1]: Unit 
>> > libvirtd.service entered failed state.
>> > Mar 19 13:03:07 mtl1-apphst04.mt.pbt.com.mt systemd[1]:
>> libvirtd.service
>> > failed.
>> > Mar 19 13:03:07 mtl1-apphst04.mt.pbt.com.mt systemd[1]:
>> libvirtd.service
>> > holdoff time over, scheduling restart.
>> > Mar 19 13:03:07 mtl1-apphst04.mt.pbt.com.mt systemd[1]: Stopped 
>> > Virtualization daemon.
>> > Mar 19 13:03:07 mtl1-apphst04.mt.pbt.com.mt systemd[1]: start 
>> > request repeated too quickly for libvirtd.service Mar 19 13:03:07 
>> > mtl1-apphst04.mt.pbt.com.mt systemd[1]: Failed to start 
>> > Virtualization daemon.
>> > Mar 19 13:03:07 mtl1-apphst04.mt.pbt.com.mt systemd[1]: Unit 
>> > libvirtd.service entered failed state.
>> > Mar 19 13:03:07 mtl1-apphst04.mt.pbt.com.mt systemd[1]:
>> libvirtd.service
>> > failed.
>> >
>> >
>> > On Tue, Mar 19, 2019 at 3:04 PM Paul Angus 
>> > <paul.angus@shapeblue.com>
>> > wrote:
>> >
>> > > Can you check that the cloudstack agent is running on the host 
>> > > and the agent logs (usual logs directory) Also worth checking 
>> > > that libvirt has started ok.  Do you have some
>> NUMA
>> > > constraints or anything which requires particular RAM configuration?
>> > >
>> > > paul.angus@shapeblue.com
>> > > www.shapeblue.com
>> > > Amadeus House, Floral Street, London  WC2E 9DPUK @shapeblue
>> > >
>> > >
>> > >
>> > >
>> > > -----Original Message-----
>> > > From: Jevgeni Zolotarjov <j.zolotarjov@gmail.com>
>> > > Sent: 19 March 2019 14:49
>> > > To: users@cloudstack.apache.org
>> > > Subject: Re: Disaster after maintenance
>> > >
>> > > Can you try migrating a VM to the server that you changed the RAM
>> amount?
>> > >
>> > > Also:
>> > > What is the hypervisor version?
>> > > KVM
>> > > QEMU Version     : 2.0.0
>> > > Release     : 1.el7.6
>> > >
>> > >
>> > > Host status in ACS?
>> > > 1st server: Unsecure
>> > > 2nd server: Disconnected
>> > >
>> > > Did you try to force a VM to start/deploy in this server where you
>> > changed
>> > > the RAM?
>> > > Host status became disconnected. I don't know how to make it
>> "connected"
>> > > again
>> > >
>> > >
>> > >
>> > > On Tue, Mar 19, 2019 at 2:42 PM Rafael Weingärtner <
>> > > rafaelweingartner@gmail.com> wrote:
>> > >
>> > > > Can you try migrating a VM to the server that you changed the RAM
>> > amount?
>> > > >
>> > > > Also:
>> > > > What is the hypervisor version?
>> > > > Host status in ACS?
>> > > > Did you try to force a VM to start/deploy in this server where you
>> > > > changed the RAM?
>> > > >
>> > > >
>> > > > On Tue, Mar 19, 2019 at 9:39 AM Jevgeni Zolotarjov
>> > > > <j.zolotarjov@gmail.com
>> > > > >
>> > > > wrote:
>> > > >
>> > > > > We have Cloudstack 4.11.2 setup running fine for few months (>4)
>> The
>> > > > > setup is very simple: 2 hosts We decided to do a maintenance
to
>> > > > > increase RAM on both servers
>> > > > >
>> > > > > For this we put first server to maintenance. All VMS moved to
>> second
>> > > > > host after a while.
>> > > > >
>> > > > > Then first server was shutdown, RAM increased, server turned
ON.
>> > > > > Now nothing starts on first server.
>> > > > >
>> > > > >
>> > > > > Tried to delete network, but this fails as well
>> > > > >
>> > > > > Please help !
>> > > > >
>> > > > > Here is extract from log:
>> > > > > ======================================
>> > > > > 2019-03-19 12:27:53,064 DEBUG
>> [o.a.c.s.SecondaryStorageManagerImpl]
>> > > > > (secstorage-1:ctx-16d6c797) (logid:7e3160ce) Zone 1 is ready
to
>> > > > > launch secondary storage VM
>> > > > > 2019-03-19 12:27:53,125 DEBUG [c.c.c.ConsoleProxyManagerImpl]
>> > > > > (consoleproxy-1:ctx-cbd034b9) (logid:0a8c8bf4) Zone 1 is ready
to
>> > > > > launch console proxy
>> > > > > 2019-03-19 12:27:53,181 DEBUG [c.c.a.ApiServlet]
>> > > > > (qtp510113906-285:ctx-6c5e11c3) (logid:cd8e30be) ===START===
>> > > > 192.168.5.140
>> > > > > -- GET
>> > > > >
>> > > > >
>> > > >
>> command=deleteNetwork&id=4ba834ed-48f3-468f-b667-9bb2d2c258f1&response
>> > > > =json&_=1552998473154
>> > > > > 2019-03-19 12:27:53,186 DEBUG [c.c.a.ApiServer]
>> > > > > (qtp510113906-285:ctx-6c5e11c3 ctx-0cc34dc6) (logid:cd8e30be)
>> CIDRs
>> > > > > from which account
>> > > > > 'Acct[15863393-8e8d-11e7-8f52-f04da2002bbe-admin]' is
>> > > > allowed
>> > > > > to perform API calls: 0.0.0.0/0,::/0
>> > > > > 2019-03-19 12:27:53,208 INFO  [o.a.c.f.j.i.AsyncJobMonitor]
>> > > > > (API-Job-Executor-1:ctx-d4970c19 job-5081) (logid:f6751fa7) Add
>> > > > > job-5081 into job monitoring
>> > > > > 2019-03-19 12:27:53,209 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>> > > > > (qtp510113906-285:ctx-6c5e11c3 ctx-0cc34dc6) (logid:cd8e30be)
>> submit
>> > > > async
>> > > > > job-5081, details: AsyncJobVO {id:5081, userId: 2, accountId:
2,
>> > > > > instanceType: Network, instanceId: 204, cmd:
>> > > > > org.apache.cloudstack.api.command.user.network.DeleteNetworkCmd,
>> > > cmdInfo:
>> > > > >
>> > > > >
>> > > >
>> {"response":"json","ctxUserId":"2","httpmethod":"GET","ctxStartEventId
>> > > >
>> ":"2615","id":"4ba834ed-48f3-468f-b667-9bb2d2c258f1","ctxDetails":"{\"
>> > > > interface
>> > > > >
>> > > > >
>> > > >
>> com.cloud.network.Network\":\"4ba834ed-48f3-468f-b667-9bb2d2c258f1\"}"
>> > > >
>> ,"ctxAccountId":"2","uuid":"4ba834ed-48f3-468f-b667-9bb2d2c258f1","cmd
>> > > > EventType":"NETWORK.DELETE","_":"1552998473154"},
>> > > > > cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode:
>> 0,
>> > > > > result: null, initMsid: 264216221068220, completeMsid: null,
>> > > lastUpdated:
>> > > > > null, lastPolled: null, created: null}
>> > > > > 2019-03-19 12:27:53,211 DEBUG [c.c.a.ApiServlet]
>> > > > > (qtp510113906-285:ctx-6c5e11c3 ctx-0cc34dc6) (logid:cd8e30be)
>> > > > > ===END===
>> > > > > 192.168.5.140 -- GET
>> > > > >
>> > > > >
>> > > >
>> command=deleteNetwork&id=4ba834ed-48f3-468f-b667-9bb2d2c258f1&response
>> > > > =json&_=1552998473154
>> > > > > 2019-03-19 12:27:53,212 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>> > > > > (API-Job-Executor-1:ctx-d4970c19 job-5081) (logid:16897ea6)
>> > > > > Executing AsyncJobVO {id:5081, userId: 2, accountId: 2,
>> > > > > instanceType: Network,
>> > > > > instanceId: 204, cmd:
>> > > > > org.apache.cloudstack.api.command.user.network.DeleteNetworkCmd,
>> > > cmdInfo:
>> > > > >
>> > > > >
>> > > >
>> {"response":"json","ctxUserId":"2","httpmethod":"GET","ctxStartEventId
>> > > >
>> ":"2615","id":"4ba834ed-48f3-468f-b667-9bb2d2c258f1","ctxDetails":"{\"
>> > > > interface
>> > > > >
>> > > > >
>> > > >
>> com.cloud.network.Network\":\"4ba834ed-48f3-468f-b667-9bb2d2c258f1\"}"
>> > > >
>> ,"ctxAccountId":"2","uuid":"4ba834ed-48f3-468f-b667-9bb2d2c258f1","cmd
>> > > > EventType":"NETWORK.DELETE","_":"1552998473154"},
>> > > > > cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode:
>> 0,
>> > > > > result: null, initMsid: 264216221068220, completeMsid: null,
>> > > lastUpdated:
>> > > > > null, lastPolled: null, created: null}
>> > > > > 2019-03-19 12:27:53,257 WARN  [o.a.c.e.o.NetworkOrchestrator]
>> > > > > (API-Job-Executor-1:ctx-d4970c19 job-5081 ctx-d5de7979)
>> > > > > (logid:16897ea6) Can't delete the network, not all user vms are
>> > > > > expunged. Vm VM[User|i-2-11-VM] is in Stopped state
>> > > > > 2019-03-19 12:27:53,263 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>> > > > > (API-Job-Executor-1:ctx-d4970c19 job-5081) (logid:16897ea6)
>> Complete
>> > > > async
>> > > > > job-5081, jobStatus: FAILED, resultCode: 530, result:
>> > > > >
>> > > > >
>> > > >
>> org.apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":
>> > > > [],"errorcode":530,"errortext":"Failed
>> > > > > to delete network"}
>> > > > > 2019-03-19 12:27:53,264 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>> > > > > (API-Job-Executor-1:ctx-d4970c19 job-5081) (logid:16897ea6)
>> Publish
>> > > > > async
>> > > > > job-5081 complete on message bus
>> > > > > 2019-03-19 12:27:53,264 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>> > > > > (API-Job-Executor-1:ctx-d4970c19 job-5081) (logid:16897ea6) Wake
>> up
>> > > > > jobs related to job-5081
>> > > > > 2019-03-19 12:27:53,264 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>> > > > > (API-Job-Executor-1:ctx-d4970c19 job-5081) (logid:16897ea6) Update
>> > > > > db status for job-5081
>> > > > > 2019-03-19 12:27:53,265 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>> > > > > (API-Job-Executor-1:ctx-d4970c19 job-5081) (logid:16897ea6) Wake
>> up
>> > > > > jobs joined with job-5081 and disjoin all subjobs created from
>> job-
>> > > > > 5081
>> > > > > 2019-03-19 12:27:53,267 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>> > > > > (API-Job-Executor-1:ctx-d4970c19 job-5081) (logid:16897ea6) Done
>> > > > executing
>> > > > > org.apache.cloudstack.api.command.user.network.DeleteNetworkCmd
>> for
>> > > > > job-5081
>> > > > > 2019-03-19 12:27:53,267 INFO  [o.a.c.f.j.i.AsyncJobMonitor]
>> > > > > (API-Job-Executor-1:ctx-d4970c19 job-5081) (logid:16897ea6) Remove
>> > > > job-5081
>> > > > > from job monitoring
>> > > > > 2019-03-19 12:27:56,230 DEBUG [c.c.a.ApiServlet]
>> > > > > (qtp510113906-28:ctx-e6c5bc85) (logid:7fe68f75) ===START===
>> > > > 192.168.5.140
>> > > > > -- GET
>> > > > >
>> > > > >
>> > > >
>> command=queryAsyncJobResult&jobId=16897ea6-27c3-45b9-a0df-ab217bb5393c
>> > > > &response=json&_=1552998476202
>> > > > > 2019-03-19 12:27:56,238 DEBUG [c.c.a.ApiServer]
>> > > > > (qtp510113906-28:ctx-e6c5bc85 ctx-da1f4cbd) (logid:7fe68f75)
CIDRs
>> > > > > from which account
>> > > > > 'Acct[15863393-8e8d-11e7-8f52-f04da2002bbe-admin]' is
>> > > > allowed
>> > > > > to perform API calls: 0.0.0.0/0,::/0
>> > > > > 2019-03-19 12:27:56,260 DEBUG [c.c.a.ApiServlet]
>> > > > > (qtp510113906-28:ctx-e6c5bc85 ctx-da1f4cbd) (logid:7fe68f75)
>> > > > > ===END===
>> > > > > 192.168.5.140 -- GET
>> > > > >
>> > > > >
>> > > >
>> command=queryAsyncJobResult&jobId=16897ea6-27c3-45b9-a0df-ab217bb5393c
>> > > > &response=json&_=1552998476202
>> > > > > 2019-03-19 12:28:00,946 INFO  [o.a.c.f.j.i.AsyncJobManagerImpl]
>> > > > > (AsyncJobMgr-Heartbeat-1:ctx-9b43d1fd) (logid:a605267a) Begin
>> > > > > cleanup expired async-jobs
>> > > > > 2019-03-19 12:28:00,951 INFO  [o.a.c.f.j.i.AsyncJobManagerImpl]
>> > > > > (AsyncJobMgr-Heartbeat-1:ctx-9b43d1fd) (logid:a605267a) End
>> cleanup
>> > > > expired
>> > > > > async-jobs
>> > > > > 2019-03-19 12:28:01,142 DEBUG
>> > > > [c.c.n.r.VirtualNetworkApplianceManagerImpl]
>> > > > > (RouterStatusMonitor-1:ctx-ad6bbe7e) (logid:04e4c72b) Found 0
>> > > > > routers to update status.
>> > > > >
>> > > >
>> > > >
>> > > > --
>> > > > Rafael Weingärtner
>> > > >
>> > >
>> >
>>
>
Mime
View raw message