cloudstack-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nitin Mehta (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CLOUDSTACK-3294) CLONE - System VMs not coming up due to “InsufficientServerCapacityException”.(not consistently reproducible)
Date Sun, 30 Jun 2013 09:30:20 GMT

    [ https://issues.apache.org/jira/browse/CLOUDSTACK-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13696310#comment-13696310
] 

Nitin Mehta commented on CLOUDSTACK-3294:
-----------------------------------------

CLOUDSTACK-2813 has the short term fix, but we need to looking up at the cleaning up resources
holistically atleast for virtual machines and have a better failover in case the cleanup fails.

Some ideas
add something like a cleanup flag in case the cleanup didn't work, and probably releasing
the resources before next retry of vm deployment, expunge thread etc, but I am not convinced
if this is the most elegant solution. Is this ok ?
Was talking to Murali and he was suggesting if long term, can can make acquiring resources
transactional ? Or enhance framework like Journal to keep a log of resources acquired and
then releasing them ? Any ideas ?
If we go down this path of checking each use case why cleanup resources can fail like for
fix in CLOUDSTACK-2813, we will end up with a lot of flags and if else conditions. While it
fixes this problem, I still see loopholes in our cleanup approach. At the minimum we should
start checking the cleanup() response. If it returns false, cleanup is not done yet and needs
to be taken care of in the future (say before another retry of vm deployment or expunge cycle).
Next step, could be making cleanup function itself more robust(example – _networkMgr.release
throws an exception and we just do nothing right now). 

                
> CLONE - System VMs not coming up due to “InsufficientServerCapacityException”.(not
consistently reproducible)
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: CLOUDSTACK-3294
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-3294
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the default.) 
>          Components: Management Server
>    Affects Versions: 4.2.0
>            Reporter: Nitin Mehta
>            Priority: Critical
>             Fix For: 4.2.0
>
>         Attachments: management-server.zip
>
>
> Seps:t
> 1.	Have a CS with advanced zone .
> 2.	Created some user VMs.
> 3.	Created VPCs and VMs under VPCs.
> 4.	Shutdown the Host(Xen) and MS.
> 5.	Start the Host and MS.
> Observation:
> The SSVM and CPVM were not coming up with “InsufficientServerCapacityException” exception.
> The Dashboard was showing exhausted  management IPs .
> Deleted all the VMS ,still the IPs were not released.
> Below is the table which shows that all the management ips are reserved.
> mysql> select * from op_dc_ip_address_alloc;
> +----+---------------+----------------+--------+--------+--------------------------------------+---------------------+-------------+
> | id | ip_address    | data_center_id | pod_id | nic_id | reservation_id            
          | taken               | mac_address |
> +----+---------------+----------------+--------+--------+--------------------------------------+---------------------+-------------+
> |  1 | 10.147.40.181 |              1 |      1 |     34 | 48d95839-6fb1-4bc4-b23a-c9f1891bf1fa
| 2013-05-31 17:10:06 |           1 |
> |  2 | 10.147.40.182 |              1 |      1 |      3 | a7b9610c-9319-478c-84e4-e70be099cd9d
| 2013-05-31 17:07:29 |           2 |
> |  3 | 10.147.40.183 |              1 |      1 |      7 | 238830cd-8cbe-411e-8016-352129885df6
| 2013-05-31 17:07:30 |           3 |
> |  4 | 10.147.40.184 |              1 |      1 |      7 | 70f091d4-acb4-435b-bfde-9bdb35bcfa6b
| 2013-05-31 17:09:15 |           4 |
> |  5 | 10.147.40.185 |              1 |      1 |     29 | 14690352-e9a0-4695-a834-0552175f7684
| 2013-05-31 17:08:45 |           5 |
> |  6 | 10.147.40.186 |              1 |      1 |     30 | 14690352-e9a0-4695-a834-0552175f7684
| 2013-05-31 17:08:45 |           6 |
> |  7 | 10.147.40.187 |              1 |      1 |      4 | a7b9610c-9319-478c-84e4-e70be099cd9d
| 2013-05-31 17:07:29 |           7 |
> |  8 | 10.147.40.188 |              1 |      1 |      7 | ea8644d1-7801-4dbb-aa0c-204f31e922a1
| 2013-05-31 17:08:25 |           8 |
> |  9 | 10.147.40.189 |              1 |      1 |      7 | 245e0082-d697-454d-9689-b36cc3b6e113
| 2013-05-31 17:11:16 |           9 |
> | 10 | 10.147.40.190 |              1 |      1 |      7 | 094e371a-da69-44e0-80fd-14c2d090e935
| 2013-05-31 17:10:15 |          10 |
> +----+---------------+----------------+--------+--------+--------------------------------------+---------------------+-------------+
> As all the IPs were in  reserved state ,SSVM and CPVM were not coming up.
> Was not able to reproduce this issue again .
> Attached is the server log.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message