cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Douglas Land <...@looprock.com>
Subject Re: Issue with 'stuck' virtual routers
Date Wed, 06 Jul 2016 02:38:48 GMT
I was able to resolve this issue today with a good deal of help from Simon
Weller (thanks!)

We're running local storage and for some reason, though we put the host in
maintenance mode and shut down the virtual routers (I didn't do this
operation so I'm not 100% certain how we went about it), the routers both
stuck in the expunging state. We completely redisked the server so that
storage pool didn't exist any longer and the virtual host certainly didn't
exist in any capacity though they were still showing up in the API calls.

I tried a restartNetwork cleanup=false operation which reported succeeded
for on network and failed for the second, but no virtual routers were
created as a result.  Eventually I deleted that migration jobs from op_ha_work,
and the instances themselves from vm_instances. It felt a bit drastic but
marking them as Destroyed didn't seem to help and I was unable to issue a
destroy command via the API. So far that appears to have take care of the
situation. Once I manually deleted the entries I was able to execute
restartNetwork
cleanup=false successfully and the virtual routers were recreated.

Frankly I'm a little nervous there might be other references to them in the
database that might haunt us later, and when we are able to have a
maintenance I'm planning to do a restartNetwork cleanup=true to make sure
that works.

On Tue, Jul 5, 2016 at 5:51 PM, ilya <ilya.mailing.lists@gmail.com> wrote:

> Hi Doug
>
> Do you have primary storage id 18 available?
>
> # cloudmonkey list storagepools id=18
>
> I can only assume cloudstack tries to clean up after it self and fails -
> because storage pool 18 is not available.
>
> Are your running local storage zone or clustered?
>
> Lastly, your logs would indicate the issue more clearly - as to why its
> not able to expunge.
>
> Regards
> ilya
>
> On 7/5/16 9:15 AM, Douglas Land wrote:
> > We pulled a host from the pool for upgrades, and in the process seems to
> > have gotten a virtual router in an odd state. It's showing as destroyed
> in
> > the UI, but cloudmonkey says it's still expunging.
> >
> > This host has been completely rebuild including completely redisked. On
> the
> > management node I found:
> >
> > mysql> select * from op_ha_work
> >     -> ;
> >
> +----+-------------+-----------+--------------+-----------+----------------+---------+---------------------+-------+-------+-----------+-------------+---------+
> > | id | instance_id | type      | vm_type      | state     |
> mgmt_server_id
> > | host_id | created             | tried | taken | step      |
> time_to_try |
> > updated |
> >
> +----+-------------+-----------+--------------+-----------+----------------+---------+---------------------+-------+-------+-----------+-------------+---------+
> > |  1 |          13 | Migration | DomainRouter | Expunging |
>  NULL
> > |      24 | 2016-07-01 14:34:17 |     0 | NULL  | Migrating |
> 1433332034 |
> >     205 |
> > |  4 |          78 | Migration | DomainRouter | Destroyed |
>  NULL
> > |      24 | 2016-07-01 14:34:17 |     0 | NULL  | Migrating |
> 1433332092 |
> >      68 |
> >
> +----+-------------+-----------+--------------+-----------+----------------+---------+---------------------+-------+-------+-----------+-------------+---------+
> >
> > I removed those  entries, but when the hosts persist. Via cloudmonkey it
> > shows expunging:
> > {
> >   "count": 1,
> >   "router": [
> >     {
> >       "account": "engineering",
> >       "created": "2014-09-05T03:56:07+0200",
> >       "dns1": "172.16.8.46",
> >       "dns2": "172.16.8.47",
> >       "domain": "engineering",
> >       "domainid": "1da498ba-5646-4cc3-a704-a20ebe12f518",
> >       "id": "dc48a402-41d8-4e93-b441-4b34eb83a4c8",
> >       "isredundantrouter": true,
> >       "name": "r-78-VM",
> >       "nic": [],
> >       "podid": "f53afa8d-51ff-484d-9a88-52e979aeb688",
> >       "redundantstate": "UNKNOWN",
> >       "requiresupgrade": false,
> >       "role": "VIRTUAL_ROUTER",
> >       "serviceofferingid": "ed6b13d0-3e74-4aa5-a6b7-a5d2ac6c4a6c",
> >       "serviceofferingname": "System Offering For Software Router",
> >       "state": "Expunging",
> >       "templateid": "bb3f7e4e-d7f6-4a72-a752-12c3221e43e9",
> >       "version": "4.4.1",
> >       "zoneid": "3467ff63-b582-4ace-9fda-8d5851bd8753",
> >       "zonename": "Oakland"
> >     }
> >   ]
> > }
> >
> > If I try to destroy the host from the api I get:
> >
> > Async job cf08d7fa-1609-4d0e-b33c-63cc38f7e897 failed
> > Error 530, Unable to locate datastore with id 18
> > {
> >   "accountid": "e3389462-6020-425a-9b9e-57141d58e1ab",
> >   "cmd":
> "org.apache.cloudstack.api.command.admin.router.DestroyRouterCmd",
> >   "created": "2016-07-05T17:23:53+0200",
> >   "jobid": "cf08d7fa-1609-4d0e-b33c-63cc38f7e897",
> >   "jobprocstatus": 0,
> >   "jobresult": {
> >     "errorcode": 530,
> >     "errortext": "Unable to locate datastore with id 18"
> >   },
> >   "jobresultcode": 530,
> >   "jobresulttype": "object",
> >   "jobstatus": 2,
> >   "userid": "xxx"
> > }
> >
> > I'm guessing I need to remove all references for the routers from the
> > database. Does anyone know what table(s) that's stored in?
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message