cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alena Prokharchyk <Alena.Prokharc...@citrix.com>
Subject Re: System VMs restarted on a disabled cluster
Date Mon, 23 Jul 2012 16:40:28 GMT
Could you please provide management server log? Api error doesn't give
much details on what has happened, which indicates NPEs in most of the
cases.

Thanks,
-Alena.

On 7/20/12 6:27 PM, "Evan Miller" <Evan.Miller@citrix.com> wrote:

>Hi Alena:
>
>OK. I tried forced=true for deleting a persistent StoragePool
>that I can't delete from the GUI. forced=true doesn't work.
>Here is my attempt:
>
>FINAL URL AFTER SPECIAL SUBSTITUTION(S):
>
>http://10.217.5.192:8080/client/api?apikey=OjtS1QBLFBFdu-gpDlS30r8dC1XhxyM
>r-UzdQ4Mi6GuXqCuzY882MI7PX2qNX80CkDMNYu5sTMc3yQeHwUmPqg&command=listStorag
>ePools&response=json&signature=P%2BeoDaq4yOHobbZao5pIaiARl18%3D
>
>HEADERS:
>Date: Sat, 21 Jul 2012 01:25:47 GMT
>Server: Apache-Coyote/1.1
>Content-Length: 564
>Content-Type: text/javascript;charset=UTF-8
>Client-Date: Sat, 21 Jul 2012 01:25:46 GMT
>Client-Peer: 10.217.5.192:8080
>Client-Response-Num: 1
>CONTENT:
>HTTP/1.1 200 OK
>Date: Sat, 21 Jul 2012 01:25:47 GMT
>Server: Apache-Coyote/1.1
>Content-Length: 564
>Content-Type: text/javascript;charset=UTF-8
>Client-Date: Sat, 21 Jul 2012 01:25:46 GMT
>Client-Peer: 10.217.5.192:8080
>Client-Response-Num: 1
>
>{ "liststoragepoolsresponse" : { "count":1 ,"storagepool" : [
>{"id":"c9c0319f-33f0-3494-9ada-4d7a2f1dafd4","zoneid":"8d17409d-3998-46b8-
>b7d5-db8b8070e077","zonename":"LS_ZONE1","podid":"290355f6-8f2e-4d75-bd03-
>706509ae6d0c","podname":"LS_POD1","name":"LS_PRIMARY1","ipaddress":"10.217
>.5.192","path":"/home/export/primary","created":"2012-07-20T16:49:45-0700"
>,"type":"NetworkFilesystem","clusterid":"7d644592-e543-496e-aea0-fbbee863e
>7f0","clustername":"LS_REZ12345","disksizetotal":104586543104,"disksizeall
>ocated":8766477312,"tags":"","state":"Maintenance"} ] } }
>
>Then, attempting to delete this persistent storage pool:
>
>FINAL URL AFTER SPECIAL SUBSTITUTION(S):
>
>http://10.217.5.192:8080/client/api?apikey=OjtS1QBLFBFdu-gpDlS30r8dC1XhxyM
>r-UzdQ4Mi6GuXqCuzY882MI7PX2qNX80CkDMNYu5sTMc3yQeHwUmPqg&command=deleteStor
>agePool&id=c9c0319f-33f0-3494-9ada-4d7a2f1dafd4&forced=true&response=json&
>signature=IEBSjPRlp3Pn4HFvgX8sutJjGPU%3D
>
>Error My Final URL:
>http://10.217.5.192:8080/client/api?apikey=OjtS1QBLFBFdu-gpDlS30r8dC1XhxyM
>r-UzdQ4Mi6GuXqCuzY882MI7PX2qNX80CkDMNYu5sTMc3yQeHwUmPqg&command=deleteStor
>agePool&id=c9c0319f-33f0-3494-9ada-4d7a2f1dafd4&forced=true&response=json&
>signature=IEBSjPRlp3Pn4HFvgX8sutJjGPU%3D
><html>
><head><title>An Error Occurred</title></head>
><body>
><h1>An Error Occurred</h1>
><p>530 Unknown code</p>
></body>
></html>
>
>
>Regards,
>Evan
>
>
>
>-----Original Message-----
>From: Alena Prokharchyk
>Sent: Friday, July 20, 2012 5:07 PM
>To: Evan Miller; cloudstack-users@incubator.apache.org
>Subject: Re: 答复: System VMs restarted on a disabled cluster
>
>On 7/20/12 3:11 PM, "Evan Miller" <Evan.Miller@citrix.com> wrote:
>
>>Hi Alena:
>>
>>I finally was able to delete the cluster. However, it required the
>>following expected and unusual steps:
>>
>>1. To my surprise, one of the system VMs had only been stopped.
>>I swear that I previously viewed the system VMs from the CMSM GUI and
>>clearly thought I saw "no data".
>>
>>2. So, I destroyed that particular system VM.
>>
>>3. Then, I doubled-checked: No Storage from CSMS GUI tab.
>>
>>4. Then, I doubled-checked: No Instances.
>>
>>5. I double-checked: Primary Storage was in maintenance mode.
>>
>>6. I attempted to delete Primary Storage, but it failed.
>>
>>7. So, I went into the MySQL database and saw several recent volumes
>>and just manually deleted all of them from MySQL:
>
>We never advise to do that, don't modify the DB unless there is no other
>way to recover. It might lead to all kinds of problems, first of all, the
>volumes will continue exist on the backend + there might be other
>cloudstack resources referencing the volumes (snapshots for instance).
>Besides, if there is a removed field, don't remove the records, just mark
>them for removal.
>
>
>And in your case all the volumes except for 1, had not null removed
>field, it means that they were succesfully removed, and didn't cause
>primary storage deletion to fail.
>Only one of them had null removed field (id=2); not sure what caused it
>to be stuck in Destroy state.  To force deletion for storage pool, next
>time call API command: deleteStoragePool with forced=true option (not
>sure if it's available in the UI, call API if not)
>
>
>>
>>
>>mysql> select * from volumes;
>>+----+------------+-----------+---------+--------------+-------------+-
>>+----+------------+-----------+---------+--------------+-------------+-
>>+----+------------+-----------+---------+--------------+-------------+-
>>+----+------------+-----------+---------+--------------+-------------+-
>>-------+--------+--------------------------------------+------------+--
>>-------+--------+--------------------------------------+------------+--
>>-------+--------+--------------------------------------+------------+-
>>-----------------+--------------------------------------+--------+-----
>>-----------------+--------------------------------------+--------+---
>>--------+------------+---------+-------------+-------------------+-----
>>--------+------------+---------+-------------+-------------------+---
>>----------+-------------+----------------------------+-------------+---
>>----------+-------------+----------------------------+-------------+---
>>---------------+----------+---------------------+---------------------+--
>>-
>>------+------------+--------------+
>>| id | account_id | domain_id | pool_id | last_pool_id | instance_id |
>>device_id | name   | uuid                                 | size       |
>>folder               | path                                 | pod_id |
>>data_center_id | iscsi_name | host_ip | volume_type | pool_type         |
>>disk_offering_id | template_id | first_snapshot_backup_uuid |
>>recreatable
>>| created             | attached | updated             | removed
>>   | state   | chain_info | update_count |
>>+----+------------+-----------+---------+--------------+-------------+-
>>+----+------------+-----------+---------+--------------+-------------+-
>>+----+------------+-----------+---------+--------------+-------------+-
>>+----+------------+-----------+---------+--------------+-------------+-
>>-------+--------+--------------------------------------+------------+--
>>-------+--------+--------------------------------------+------------+--
>>-------+--------+--------------------------------------+------------+-
>>-----------------+--------------------------------------+--------+-----
>>-----------------+--------------------------------------+--------+---
>>--------+------------+---------+-------------+-------------------+-----
>>--------+------------+---------+-------------+-------------------+---
>>----------+-------------+----------------------------+-------------+---
>>----------+-------------+----------------------------+-------------+---
>>---------------+----------+---------------------+---------------------+--
>>-
>>------+------------+--------------+
>>|  1 |          1 |         1 |     201 |         NULL |           1 |
>>     0 | ROOT-1 | c7442a08-7c15-453c-9041-2315295ef512 | 2147483648 |
>>/home/export/primary | b10b18fc-7fab-4e03-8cb1-28fd77d4d42c |      1 |
>>          1 | NULL       | NULL    | ROOT        | NetworkFilesystem |
>>            6 |           1 | NULL                       |           1 |
>>2012-07-20 19:20:29 | NULL     | 2012-07-20 19:32:29 | 2012-07-20
>>19:32:30 | Destroy | NULL       |            5 |
>>|  2 |          1 |         1 |     201 |         NULL |           2 |
>>     0 | ROOT-2 | ccbdb9ae-b2d7-4eda-b6a0-42b1c2a598fc | 2147483648 |
>>/home/export/primary | ac9dbd13-790f-491c-be4f-0d757e5a6ac3 |      1 |
>>          1 | NULL       | NULL    | ROOT        | NetworkFilesystem |
>>            8 |           1 | NULL                       |           1 |
>>2012-07-20 19:20:29 | NULL     | 2012-07-20 19:34:33 | NULL
>> | Destroy | NULL       |            5 |
>>|  3 |          1 |         1 |    NULL |         NULL |           3 |
>>     0 | ROOT-3 | 5fe6d1ce-e90f-4531-8b14-6c09d276dfd5 |  565240320 |
>>NULL                 | NULL                                 |   NULL |
>>          1 | NULL       | NULL    | ROOT        | NULL              |
>>            6 |           1 | NULL                       |           1 |
>>2012-07-20 19:32:59 | NULL     | 2012-07-20 19:33:29 | 2012-07-20
>>19:33:29 | Destroy | NULL       |            2 |
>>|  4 |          1 |         1 |    NULL |         NULL |           4 |
>>     0 | ROOT-4 | b7d1c546-e31a-4cbd-bb7c-1d5df44b8d1a |  565240320 |
>>NULL                 | NULL                                 |   NULL |
>>          1 | NULL       | NULL    | ROOT        | NULL              |
>>            6 |           1 | NULL                       |           1 |
>>2012-07-20 19:33:59 | NULL     | 2012-07-20 19:34:29 | 2012-07-20
>>19:34:29 | Destroy | NULL       |            2 |
>>|  5 |          1 |         1 |    NULL |         NULL |           5 |
>>     0 | ROOT-5 | 58f0d686-efd0-445a-9861-b5950dfcb8bb |  565240320 |
>>NULL                 | NULL                                 |   NULL |
>>          1 | NULL       | NULL    | ROOT        | NULL              |
>>            6 |           1 | NULL                       |           1 |
>>2012-07-20 19:34:59 | NULL     | 2012-07-20 19:35:09 | 2012-07-20
>>19:35:10 | Destroy | NULL       |            2 |
>>|  6 |          1 |         1 |    NULL |         NULL |           6 |
>>     0 | ROOT-6 | d38211db-9594-49a4-8621-08a513321e6f |  565240320 |
>>NULL                 | NULL                                 |   NULL |
>>          1 | NULL       | NULL    | ROOT        | NULL              |
>>            6 |           1 | NULL                       |           1 |
>>2012-07-20 19:35:29 | NULL     | 2012-07-20 21:34:02 | 2012-07-20
>>21:34:02 | Destroy | NULL       |            2 |
>>+----+------------+-----------+---------+--------------+-------------+-
>>+----+------------+-----------+---------+--------------+-------------+-
>>+----+------------+-----------+---------+--------------+-------------+-
>>+----+------------+-----------+---------+--------------+-------------+-
>>-------+--------+--------------------------------------+------------+--
>>-------+--------+--------------------------------------+------------+--
>>-------+--------+--------------------------------------+------------+-
>>-----------------+--------------------------------------+--------+-----
>>-----------------+--------------------------------------+--------+---
>>--------+------------+---------+-------------+-------------------+-----
>>--------+------------+---------+-------------+-------------------+---
>>----------+-------------+----------------------------+-------------+---
>>----------+-------------+----------------------------+-------------+---
>>---------------+----------+---------------------+---------------------+--
>>-
>>------+------------+--------------+
>>6 rows in set (0.00 sec)
>>
>>mysql> delete from volumes;
>>Query OK, 6 rows affected (0.05 sec)
>>
>>mysql> select * from volumes;
>>Empty set (0.00 sec)
>>
>>mysql>
>>
>>8. After deletion, I went back to the CSMS GUI. I attempted to delete
>>Primary Storage, but it was gone already.
>>
>>9. I could then delete the cluster.
>>
>>Why was it necessary to manually delete the volumes from MySQL?
>>Was there something in one or more of those volume entries that
>>prevented deletion of the storage pool?
>
>
>
>It should never be necessary - see my comment to 7). We never advise
>people to mock up with the database unless there is no other way to
>recover from situation.
>
>>
>>NOTE: XenCenter was displaying c9c0319f-33f0-3494-9ada-4d7a2f1dafd4
>>as if it were a separate device, independent of the XenServers.
>>I couldn't delete c9c0319f-33f0-3494-9ada-4d7a2f1dafd4 from XenCenter
>>either. Is c9c0319f-33f0-3494-9ada-4d7a2f1dafd4 the NFS share?
>
>I don't know why it happens.
>
>>Regards,
>>Evan
>>
>>
>>-----Original Message-----
>>From: Alena Prokharchyk
>>Sent: Friday, July 20, 2012 12:58 PM
>>To: cloudstack-users@incubator.apache.org
>>Cc: Evan Miller
>>Subject: Re: 答复: System VMs restarted on a disabled cluster
>>
>>All volumes allocated in the pool, have to be destroyed ("Cannot delete
>>pool LS_PRIMARY1 as there are associated vols for this pool" error
>>indicates it). Please destroy all vms using this pool.
>>
>>-Alena.
>>
>>On 7/20/12 12:52 PM, "Evan Miller" <Evan.Miller@citrix.com> wrote:
>>
>>>Hi Alena:
>>>
>>>I got thwarted on one of the cluster deletion steps.
>>>
>>>>* disable cluster
>>>>* enable maintenance for the primary storage in the cluster
>>>>* put hosts in cluster into maintenance mode
>>>>
>>>>* destroy system vms
>>>>* delete hosts and primary storage
>>>
>>>From CSMS GUI ...
>>>I can delete the hosts.
>>>However, I couldn't delete primary storage.
>>>The error said "Failed to delete storage pool".
>>>
>>>I can list the particular storage pool:
>>>
>>>FINAL URL AFTER SPECIAL SUBSTITUTION(S):
>>>
>>>http://10.217.5.192:8080/client/api?apikey=bb0HqLkZWZl87olMVaQ1MCWgt_3
>>>N
>>>PPf
>>>oWLorilzI-vDpwSgN1KF2KfSoUl00yHNxa8x2aYrMfG2d_s-FXu_Tfg&command=listSt
>>>o
>>>rag
>>>ePools&clusterid=c03d4dee-d8cd-475b-962b-14149ba3be45&response=json&si
>>>g
>>>nat
>>>ure=7q%2BIr4lZMbsjctbnUidIej9gtgk%3D
>>>
>>>HEADERS:
>>>Date: Fri, 20 Jul 2012 19:43:40 GMT
>>>Server: Apache-Coyote/1.1
>>>Content-Length: 562
>>>Content-Type: text/javascript;charset=UTF-8
>>>Client-Date: Fri, 20 Jul 2012 19:43:39 GMT
>>>Client-Peer: 10.217.5.192:8080
>>>Client-Response-Num: 1
>>>CONTENT:
>>>HTTP/1.1 200 OK
>>>Date: Fri, 20 Jul 2012 19:43:40 GMT
>>>Server: Apache-Coyote/1.1
>>>Content-Length: 562
>>>Content-Type: text/javascript;charset=UTF-8
>>>Client-Date: Fri, 20 Jul 2012 19:43:39 GMT
>>>Client-Peer: 10.217.5.192:8080
>>>Client-Response-Num: 1
>>>
>>>{ "liststoragepoolsresponse" : { "count":1 ,"storagepool" : [
>>>{"id":"c9c0319f-33f0-3494-9ada-4d7a2f1dafd4","zoneid":"5127f0df-0d5e-4
>>>a
>>>22-
>>>9c88-fba8ff592612","zonename":"LS_ZONE1","podid":"c89cb02e-78f9-413f-8
>>>7
>>>83-
>>>19d1baaddb03","podname":"LS_POD1","name":"LS_PRIMARY1","ipaddress":"10.
>>>217
>>>.5.192","path":"/home/export/primary","created":"2012-07-20T12:20:01-0
>>>700
>>>"
>>>,"type":"NetworkFilesystem","clusterid":"c03d4dee-d8cd-475b-962b-14149
>>>b
>>>a3b
>>>e45","clustername":"LS_R12345","disksizetotal":104586543104,"disksizea
>>>l loc ated":2712723968,"tags":"","state":"Maintenance"} ] } }
>>>
>>>NOTE: Under Storage tab from the GUI, there is no data.
>>>
>>>But I can't delete that storage pool:
>>>
>>>FINAL URL AFTER SPECIAL SUBSTITUTION(S):
>>>
>>>http://10.217.5.192:8080/client/api?apikey=bb0HqLkZWZl87olMVaQ1MCWgt_3
>>>N
>>>PPf
>>>oWLorilzI-vDpwSgN1KF2KfSoUl00yHNxa8x2aYrMfG2d_s-FXu_Tfg&command=delete
>>>S
>>>tor
>>>agePool&id=c9c0319f-33f0-3494-9ada-4d7a2f1dafd4&response=json&signatur
>>>e
>>>=8z
>>>4Rbi2t%2BzKHvCkJ2USIRC%2Bx8oQ%3D
>>>
>>>Error My Final URL:
>>>http://10.217.5.192:8080/client/api?apikey=bb0HqLkZWZl87olMVaQ1MCWgt_3
>>>N
>>>PPf
>>>oWLorilzI-vDpwSgN1KF2KfSoUl00yHNxa8x2aYrMfG2d_s-FXu_Tfg&command=delete
>>>S
>>>tor
>>>agePool&id=c9c0319f-33f0-3494-9ada-4d7a2f1dafd4&response=json&signatur
>>>e
>>>=8z
>>>4Rbi2t%2BzKHvCkJ2USIRC%2Bx8oQ%3D
>>><html>
>>><head><title>An Error Occurred</title></head> <body>
<h1>An Error
>>>Occurred</h1>
>>><p>530 Unknown code</p>
>>></body>
>>></html>
>>>moonshine#
>>>
>>>The api log says:
>>>
>>>2012-07-20 12:46:05,499 INFO  [cloud.api.ApiServer]
>>>(catalina-exec-10:null) (userId=2 accountId=2
>>>sessionId=DC150E34937E29953352893CADABEA63) 10.216.134.53 -- GET
>>>command=deleteStoragePool&id=c9c0319f-33f0-3494-9ada-4d7a2f1dafd4&resp
>>>o
>>>nse
>>>=json&sessionkey=UsR2i5%2FbTT7zW8RfStD8aH6EqVA%3D&_=1342813564939 530
>>>Failed to delete storage pool
>>>
>>>The management log says this:
>>>
>>>2012-07-20 12:46:05,497 WARN  [cloud.storage.StorageManagerImpl]
>>>(catalina-exec-10:null) Cannot delete pool LS_PRIMARY1 as there are
>>>associated vols for this pool
>>>
>>>I need to be able to cleanly (and often) delete clusters, since each
>>>labscaler reservation will require a cluster.
>>>
>>>Is there something in the database that needs to be cleaned out?
>>>
>>>>* delete the cluster
>>>
>>>Regards,
>>>Evan
>>>
>>>
>>>-----Original Message-----
>>>From: Alena Prokharchyk
>>>Sent: Friday, July 13, 2012 4:26 PM
>>>To: Evan Miller
>>>Subject: FW: 答复: System VMs restarted on a disabled cluster
>>>
>>>On 7/11/12 8:20 PM, "Mice Xia" <mice_xia@tcloudcomputing.com> wrote:
>>>
>>>>Hi, Alena,
>>>>
>>>>Im trying to follow your steps:
>>>>
>>>>* disable cluster
>>>>Succeed.
>>>>
>>>>* enable maintenance for the primary storage in the cluster
>>>>Maintenance on VMware cluster failed for the first two trys, with
>>>>error message
>>>>like:
>>>>Unable to create a deployment for VM[ConsoleProxy|v-38-VM]
>>>>
>>>>WARN  [cloud.consoleproxy.ConsoleProxyManagerImpl] (consoleproxy-1:)
>>>>Exception while trying to start console proxy
>>>>com.cloud.exception.InsufficientServerCapacityException: Unable to
>>>>create a deployment for VM[ConsoleProxy|v-47-VM]Scope=interface
>>>>com.cloud.dc.DataCenter; id=1
>>>>
>>>>seems each time a new system VM was created, but still on VMware
>>>>cluster, which leads to failure The maintenance succeed in the third
>>>>try.
>>>>
>>>>* put hosts in cluster into maintenance mode Succeed
>>>>
>>>>* destroy system vms
>>>>Destroying them does not stop them re-create
>>>>
>>>>* delete hosts and primary storage
>>>>Failed to delete primary storage, with message: there are still
>>>>volumes associated with this pool
>>>>
>>>>* delete the cluster
>>>>
>>>>
>>>>Putting hosts/storage into maintenance mode does not stop system VMs
>>>>re-create From codes I can see management server get supported
>>>>hypervisorTypes and always fetch the first one, and the first one in
>>>>my environment happens to be vmware.
>>>>
>>>>I have changed expunge.interval = expunge.delay = 120 Should I set
>>>>consoleproxy.restart = false and update db to set
>>>>secondary.storage.vm=false ?
>>>>
>>>>Regards
>>>>Mice
>>>>
>>>>-----邮件原件-----
>>>>发件人: Alena Prokharchyk [mailto:Alena.Prokharchyk@citrix.com]
>>>>发送时间: 2012年7月12日 10:03
>>>>收件人: cloudstack-dev@incubator.apache.org
>>>>主题: Re: System VMs restarted on a disabled cluster
>>>>
>>>>On 7/11/12 6:29 PM, "Mice Xia" <mice_xia@tcloudcomputing.com> wrote:
>>>>
>>>>>Hi, All
>>>>>
>>>>>
>>>>>
>>>>>I've set up an environment with two clusters (in the same pod), one
>>>>>Xenserver and the other is VMware, based on 3.0.x ASF branch.
>>>>>
>>>>>Now I'm trying to remove the VMware cluster begin with disabling it
>>>>>and destroying the system VMS running on it, but the systemVMs
>>>>>restarted immediately on VMware cluster, which blocks cluster removal.
>>>>>
>>>>>
>>>>>
>>>>>I wonder if this is the expected result by design, or should it be
>>>>>better that the system VMs get allocated on an enabled cluster?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>Regards
>>>>>
>>>>>Mice
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>It's by design. Disabled cluster just can't be used for creating new
>>>>/ starting existing user vms / routers; but it still can be used by
>>>>system resources (SSVM and Console proxy).
>>>>
>>>>To delete the cluster, you need to:
>>>>
>>>>* disable cluster
>>>>* enable maintenance for the primary storage in the cluster
>>>>* put hosts in cluster into maintenance mode
>>>>
>>>>* destroy system vms
>>>>* delete hosts and primary storage
>>>>* delete the cluster
>>>>
>>>>-Alena.
>>>>
>>>>
>>>
>>>
>>>
>>
>>
>>
>
>
>


Mime
View raw message