cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kambiz Darabi <dar...@m-creations.com>
Subject Re: Virtual Router doesn't start
Date Sat, 22 Mar 2014 12:41:13 GMT
Hi Alena,

thank you for your help.

The query returns no rows, i.e. nics.removed was not null, but I removed
the row though to see what happens: a new virtual router was created
which also couldn't be started due to the same NPE. I reverted the
change by restoring from the dump.

I have to mention that prior to the restart, r-7-VM was the router which
was used by my instances. I deleted the router using the UI after the first
occurrence of the NPE, because a post with a similar problem suggested
that the deleted router would be recreated again (and this procedure
solved the problem).

Below I have attached the state of the two tables.

Anything else I can try?

Thank you


Kambiz

mysql> select n.id, n.removed, n.ip4_address, n.netmask, n.gateway, n.ip_type, n.reserver_name,
n.network_id, i.id as instance_id, i.name, i.state, i.type from vm_instance i join nics n
on n.instance_id = i.id where i.type = 'DomainRouter';
+----+---------------------+---------------+---------------+-------------+---------+--------------------------+------------+-------------+---------+-----------+--------------+
| id | removed             | ip4_address   | netmask       | gateway     | ip_type | reserver_name
           | network_id | instance_id | name    | state     | type         |
+----+---------------------+---------------+---------------+-------------+---------+--------------------------+------------+-------------+---------+-----------+--------------+
|  9 | 2014-03-17 11:27:58 | 10.124.99.1   | 255.255.255.0 | NULL        | NULL    | ExternalGuestNetworkGuru
|        204 |           4 | r-4-VM  | Expunging | DomainRouter |
| 10 | 2014-03-17 11:27:58 | NULL          | NULL          | NULL        | NULL    | ControlNetworkGuru
      |        202 |           4 | r-4-VM  | Expunging | DomainRouter |
| 11 | 2014-03-17 11:27:58 | 10.193.17.139 | 255.255.255.0 | 10.193.17.1 | NULL    | PublicNetworkGuru
       |        200 |           4 | r-4-VM  | Expunging | DomainRouter |
| 14 | 2014-03-17 11:27:52 | 10.124.99.1   | 255.255.255.0 | NULL        | NULL    | ExternalGuestNetworkGuru
|        205 |           7 | r-7-VM  | Expunging | DomainRouter |
| 15 | 2014-03-17 11:27:52 | NULL          | NULL          | NULL        | NULL    | ControlNetworkGuru
      |        202 |           7 | r-7-VM  | Expunging | DomainRouter |
| 16 | 2014-03-17 11:27:52 | 10.193.17.190 | 255.255.255.0 | 10.193.17.1 | NULL    | PublicNetworkGuru
       |        200 |           7 | r-7-VM  | Expunging | DomainRouter |
| 26 | 2014-03-18 08:11:16 | 10.124.99.1   | 255.255.255.0 | NULL        | NULL    | ExternalGuestNetworkGuru
|        205 |          18 | r-18-VM | Expunging | DomainRouter |
| 27 | 2014-03-18 08:11:16 | NULL          | NULL          | NULL        | NULL    | ControlNetworkGuru
      |        202 |          18 | r-18-VM | Expunging | DomainRouter |
| 28 | 2014-03-18 08:11:16 | 10.193.17.190 | 255.255.255.0 | 10.193.17.1 | NULL    | PublicNetworkGuru
       |        200 |          18 | r-18-VM | Expunging | DomainRouter |
| 29 | NULL                | 10.124.99.1   | 255.255.255.0 | NULL        | NULL    | ExternalGuestNetworkGuru
|        205 |          19 | r-19-VM | Stopped   | DomainRouter |
| 30 | NULL                | NULL          | NULL          | NULL        | NULL    | ControlNetworkGuru
      |        202 |          19 | r-19-VM | Stopped   | DomainRouter |
| 31 | NULL                | 10.193.17.190 | 255.255.255.0 | 10.193.17.1 | NULL    | PublicNetworkGuru
       |        200 |          19 | r-19-VM | Stopped   | DomainRouter |
+----+---------------------+---------------+---------------+-------------+---------+--------------------------+------------+-------------+---------+-----------+--------------+

mysql> select * from router_network_ref;
+----+-----------+------------+------------+
| id | router_id | network_id | guest_type |
+----+-----------+------------+------------+
|  1 |         4 |        204 | Isolated   |
|  2 |         7 |        205 | Isolated   |
|  3 |        18 |        205 | Isolated   |
|  4 |        19 |        205 | Isolated   |
+----+-----------+------------+------------+



Alena Prokharchyk <Alena.Prokharchyk@citrix.com> wrote:
> 
> The error happens not because Ip is null, but because the nic in a certain
> network can¹t be found. Looks like there is some bug in VPC nic
> plug/unplug for Guest networks process.
>
> Kambiz, please do the following to fix it:
>
> 1) Stop the MS
> 2) Take the DB dump of cloud db in case  you have to revert back.
> 3) Run the query:
>
> select * from router_network_ref where router_id=<id of your VR) and
> network_id not in (select network_id from nics where instance_id=<ID of
> your VR> and removed is null);
>
> It will give you the list of networks refs that somehow weren¹t cleaned
> during the nic detach. Remove the entry returned from router_network_ref
> table.
>
> Let me know how it works.
>
> -Alena.
>
>
> On 3/21/14, 3:36 PM, "Kambiz Darabi" <darabi@m-creations.com> wrote:
>
>>Hello,
>>
>>as this is my first post to the list, I would like to thank all
>>contributors for Cloudstack which I use since last fall without any
>>problems. I run 4.1.1 with KVM and advanced networking.
>>
>>After a restart of the management server (stopping and starting the java
>>process), the virtual domain router doesn't start and
>>management-server.log shows a NullPointerException in
>>NetworkModelImpl.getIpInNetwork (cf. stack trace below).
>>
>>By putting the server in debug mode and remote debugging, I found out
>>that the reason is a row in the table nics which has NULL in ip (cf. row
>>with id 30 in the result of the select statement below).
>>
>>What can I do to quickly solve this problem? Any pointers or suggestions
>>are appreciated as the system is currently unusable.
>>
>>Thank you for your help
>>
>>
>>Kambiz
>>
>>
>>management-server.log:
>>
>>2014-03-18 10:03:27,151 DEBUG [cloud.network.NetworkManagerImpl]
>>(Job-Executor-1:job-176) Asking VirtualRouter to prepare for
>>Nic[29-19-30e229ba-21bd-4ab5-8570-9f495bce5019-10.124.99.1]
>>2014-03-18 10:03:27,151 DEBUG [cloud.network.NetworkManagerImpl]
>>(Job-Executor-1:job-176) Asking Ovs to prepare for
>>Nic[29-19-30e229ba-21bd-4ab5-8570-9f495bce5019-10.124.99.1]
>>2014-03-18 10:03:27,151 DEBUG [cloud.network.NetworkManagerImpl]
>>(Job-Executor-1:job-176) Asking SecurityGroupProvider to prepare for
>>Nic[29-19-30e229ba-21bd-4ab5-8570-9f495bce5019-10.124.99.1]
>>2014-03-18 10:03:27,151 DEBUG [cloud.network.NetworkManagerImpl]
>>(Job-Executor-1:job-176) Asking VpcVirtualRouter to prepare for
>>Nic[29-19-30e229ba-21bd-4ab5-8570-9f495bce5019-10.124.99.1]
>>2014-03-18 10:03:27,151 WARN  [network.element.VpcVirtualRouterElement]
>>(Job-Executor-1:job-176) Network Ntwk[205|Guest|8] is not associated with
>>any VPC
>>2014-03-18 10:03:27,151 DEBUG [cloud.network.NetworkManagerImpl]
>>(Job-Executor-1:job-176) Asking NiciraNvp to prepare for
>>Nic[29-19-30e229ba-21bd-4ab5-8570-9f495bce5019-10.124.99.1]
>>2014-03-18 10:03:27,151 DEBUG [network.element.NiciraNvpElement]
>>(Job-Executor-1:job-176) Checking if NiciraNvpElement can handle service
>>Connectivity on network net1
>>2014-03-18 10:03:27,153 DEBUG [cloud.network.NetworkModelImpl]
>>(Job-Executor-1:job-176) Service SecurityGroup is not supported in the
>>network id=205
>>2014-03-18 10:03:27,156 DEBUG [cloud.network.NetworkManagerImpl]
>>(Job-Executor-1:job-176) Lock is acquired for network id 202 as a part of
>>network implement
>>2014-03-18 10:03:27,156 DEBUG [cloud.network.NetworkManagerImpl]
>>(Job-Executor-1:job-176) Network id=202 is already implemented
>>2014-03-18 10:03:27,157 DEBUG [cloud.network.NetworkManagerImpl]
>>(Job-Executor-1:job-176) Lock is released for network id 202 as a part of
>>network implement
>>2014-03-18 10:03:27,187 DEBUG [cloud.network.NetworkManagerImpl]
>>(Job-Executor-1:job-176) Asking VirtualRouter to prepare for
>>Nic[30-19-30e229ba-21bd-4ab5-8570-9f495bce5019-169.254.3.99]
>>2014-03-18 10:03:27,187 DEBUG [cloud.network.NetworkManagerImpl]
>>(Job-Executor-1:job-176) Asking Ovs to prepare for
>>Nic[30-19-30e229ba-21bd-4ab5-8570-9f495bce5019-169.254.3.99]
>>2014-03-18 10:03:27,187 DEBUG [cloud.network.NetworkManagerImpl]
>>(Job-Executor-1:job-176) Asking SecurityGroupProvider to prepare for
>>Nic[30-19-30e229ba-21bd-4ab5-8570-9f495bce5019-169.254.3.99]
>>2014-03-18 10:03:27,187 DEBUG [cloud.network.NetworkManagerImpl]
>>(Job-Executor-1:job-176) Asking VpcVirtualRouter to prepare for
>>Nic[30-19-30e229ba-21bd-4ab5-8570-9f495bce5019-169.254.3.99]
>>2014-03-18 10:03:27,187 WARN  [network.element.VpcVirtualRouterElement]
>>(Job-Executor-1:job-176) Network Ntwk[202|Control|3] is not associated
>>with any VPC
>>2014-03-18 10:03:27,188 DEBUG [cloud.network.NetworkManagerImpl]
>>(Job-Executor-1:job-176) Asking NiciraNvp to prepare for
>>Nic[30-19-30e229ba-21bd-4ab5-8570-9f495bce5019-169.254.3.99]
>>2014-03-18 10:03:27,188 DEBUG [network.element.NiciraNvpElement]
>>(Job-Executor-1:job-176) Checking if NiciraNvpElement can handle service
>>Connectivity on network null
>>2014-03-18 10:03:27,190 DEBUG [cloud.storage.StorageManagerImpl]
>>(Job-Executor-1:job-176) Checking if we need to prepare 1 volumes for
>>VM[DomainRouter|r-19-VM]
>>2014-03-18 10:03:27,190 DEBUG [cloud.storage.StorageManagerImpl]
>>(Job-Executor-1:job-176) No need to recreate the volume:
>>Vol[24|vm=19|ROOT], since it already has a pool assigned: 200, adding
>>disk to VM
>>2014-03-18 10:03:27,224 DEBUG
>>[network.router.VirtualNetworkApplianceManagerImpl]
>>(Job-Executor-1:job-176) Boot Args for VM[DomainRouter|r-19-VM]:
>>template=domP name=r-19-VM eth2ip=10.193.17.190 eth2mask=255.255.255.0
>>gateway=10.193.17.1 eth0ip=10.124.99.1 eth0mask=255.255.255.0
>>domain=cs6cloud.internal dhcprange=10.124.99.1 eth0ip=169.254.3.99
>>eth0mask=255.255.0.0 type=router disable_rp_filter=true dns1=10.193.17.1
>>2014-03-18 10:03:27,343 DEBUG
>>[network.router.VirtualNetworkApplianceManagerImpl]
>>(Job-Executor-1:job-176) Found 8 ip(s) to apply as a part of domR
>>VM[DomainRouter|r-19-VM] start.
>>2014-03-18 10:03:27,415 DEBUG
>>[network.router.VirtualNetworkApplianceManagerImpl]
>>(Job-Executor-1:job-176) Resending ipAssoc, port forwarding, load
>>balancing rules as a part of Virtual router start
>>2014-03-18 10:03:27,499 DEBUG
>>[network.router.VirtualNetworkApplianceManagerImpl]
>>(Job-Executor-1:job-176) Found 12 firewall Egress rule(s) to apply as a
>>part of domR VM[DomainRouter|r-19-VM] start.
>>2014-03-18 10:03:27,593 ERROR [cloud.vm.VirtualMachineManagerImpl]
>>(Job-Executor-1:job-176) Failed to start instance VM[DomainRouter|r-19-VM]
>>java.lang.NullPointerException
>>	at 
>>com.cloud.network.NetworkModelImpl.getIpInNetwork(NetworkModelImpl.java:76
>>3)
>>	at 
>>com.cloud.network.router.VirtualNetworkApplianceManagerImpl.finalizeNetwor
>>kRulesForNetwork(VirtualNetworkApplianceManagerImpl.java:2346)
>>	at 
>>com.cloud.network.router.VpcVirtualNetworkApplianceManagerImpl.finalizeNet
>>workRulesForNetwork(VpcVirtualNetworkApplianceManagerImpl.java:928)
>>	at 
>>com.cloud.network.router.VirtualNetworkApplianceManagerImpl.finalizeComman
>>dsOnStart(VirtualNetworkApplianceManagerImpl.java:2241)
>>	at 
>>com.cloud.network.router.VpcVirtualNetworkApplianceManagerImpl.finalizeCom
>>mandsOnStart(VpcVirtualNetworkApplianceManagerImpl.java:767)
>>	at 
>>com.cloud.network.router.VirtualNetworkApplianceManagerImpl.finalizeDeploy
>>ment(VirtualNetworkApplianceManagerImpl.java:2205)
>>	at 
>>com.cloud.vm.VirtualMachineManagerImpl.advanceStart(VirtualMachineManagerI
>>mpl.java:763)
>>	at 
>>com.cloud.vm.VirtualMachineManagerImpl.start(VirtualMachineManagerImpl.jav
>>a:471)
>>	at 
>>com.cloud.network.router.VirtualNetworkApplianceManagerImpl.start(VirtualN
>>etworkApplianceManagerImpl.java:2616)
>>	at 
>>com.cloud.network.router.VirtualNetworkApplianceManagerImpl.startVirtualRo
>>uter(VirtualNetworkApplianceManagerImpl.java:1824)
>>	at 
>>com.cloud.network.router.VirtualNetworkApplianceManagerImpl.startRouters(V
>>irtualNetworkApplianceManagerImpl.java:1924)
>>	at 
>>com.cloud.network.router.VirtualNetworkApplianceManagerImpl.deployVirtualR
>>outerInGuestNetwork(VirtualNetworkApplianceManagerImpl.java:1902)
>>	at 
>>com.cloud.network.element.VirtualRouterElement.implement(VirtualRouterElem
>>ent.java:175)
>>	at 
>>com.cloud.network.NetworkManagerImpl.implementNetworkElementsAndResources(
>>NetworkManagerImpl.java:1518)
>>	at 
>>com.cloud.network.NetworkManagerImpl.implementNetwork(NetworkManagerImpl.j
>>ava:1434)
>>	at 
>>com.cloud.utils.component.ComponentInstantiationPostProcessor$InterceptorD
>>ispatcher.intercept(ComponentInstantiationPostProcessor.java:125)
>>	at 
>>com.cloud.network.NetworkManagerImpl.startNetwork(NetworkManagerImpl.java:
>>2435)
>>	at 
>>com.cloud.network.router.VirtualNetworkApplianceManagerImpl.startRouter(Vi
>>rtualNetworkApplianceManagerImpl.java:2855)
>>	at 
>>com.cloud.network.router.VirtualNetworkApplianceManagerImpl.startRouter(Vi
>>rtualNetworkApplianceManagerImpl.java:2824)
>>	at 
>>com.cloud.utils.component.ComponentInstantiationPostProcessor$InterceptorD
>>ispatcher.intercept(ComponentInstantiationPostProcessor.java:125)
>>	at 
>>org.apache.cloudstack.api.command.admin.router.StartRouterCmd.execute(Star
>>tRouterCmd.java:103)
>>
>>
>>table nics:
>>
>>mysql> select * from nics where reserver_name = 'ControlNetworkGuru';
>>+----+--------------------------------------+-------------+---------------
>>----+---------------+-------------+-------------+---------+---------------
>>+------------+--------+--------------+----------+--------------------+----
>>----------------------------------+-----------+---------------------+-----
>>----------+-------------+-------------+--------------------+--------------
>>-------+---------------------+-------------+----------+
>>| id | uuid                                 | instance_id | mac_address
>>    | ip4_address   | netmask     | gateway     | ip_type | broadcast_uri
>>| network_id | mode   | state        | strategy | reserver_name      |
>>reservation_id                       | device_id | update_time         |
>>isolation_uri | ip6_address | default_nic | vm_type            | created
>>           | removed             | ip6_gateway | ip6_cidr |
>>+----+--------------------------------------+-------------+---------------
>>----+---------------+-------------+-------------+---------+---------------
>>+------------+--------+--------------+----------+--------------------+----
>>----------------------------------+-----------+---------------------+-----
>>----------+-------------+-------------+--------------------+--------------
>>-------+---------------------+-------------+----------+
>>|  2 | 289aacb8-cfd7-4879-a632-6cfbda36cbf4 |           1 |
>>0e:00:a9:fe:00:55 | 169.254.0.85  | 255.255.0.0 | 169.254.0.1 | Ip4     |
>>NULL          |        202 | Static | Reserved     | Start    |
>>ControlNetworkGuru | 993864b4-9dde-47d6-8fd6-cf94050442c6 |         0 |
>>2014-03-17 22:21:38 | NULL          | NULL        |           0 |
>>SecondaryStorageVm | 2013-09-06 12:44:42 | NULL                | NULL
>>   | NULL     |
>>|  6 | 5fdf4b1a-b90c-4c79-9d42-9eaf87eaa042 |           2 |
>>0e:00:a9:fe:02:d3 | 169.254.2.211 | 255.255.0.0 | 169.254.0.1 | Ip4     |
>>NULL          |        202 | Static | Reserved     | Start    |
>>ControlNetworkGuru | 852e0a65-c72a-448f-ac71-2bb3549a5a41 |         0 |
>>2014-03-17 22:21:38 | NULL          | NULL        |           0 |
>>ConsoleProxy       | 2013-09-06 12:44:42 | NULL                | NULL
>>   | NULL     |
>>| 10 | 4c4e6368-95d7-419a-a9b3-a5bb394197f0 |           4 | NULL
>>    | NULL          | NULL        | NULL        | NULL    | NULL
>>|        202 | Static | Deallocating | Start    | ControlNetworkGuru |
>>c28e8ddc-c106-462e-96c8-5d5216dad9b7 |         1 | 2014-03-17 12:27:58 |
>>NULL          | NULL        |           0 | DomainRouter       |
>>2013-09-10 08:08:39 | 2014-03-17 11:27:58 | NULL        | NULL     |
>>| 15 | 1f2e99c0-9cd9-47aa-ab10-f190efd7a2dc |           7 | NULL
>>    | NULL          | NULL        | NULL        | NULL    | NULL
>>|        202 | Static | Deallocating | Start    | ControlNetworkGuru |
>>ca1aa99e-e630-4533-9642-523d8a8b1fea |         1 | 2014-03-17 12:27:52 |
>>NULL          | NULL        |           0 | DomainRouter       |
>>2013-09-12 10:58:03 | 2014-03-17 11:27:52 | NULL        | NULL     |
>>| 27 | 1c98c4f2-f604-4a38-a813-f68833b1d250 |          18 | NULL
>>    | NULL          | NULL        | NULL        | NULL    | NULL
>>|        202 | Static | Deallocating | Start    | ControlNetworkGuru |
>>ad8e0e50-72aa-4c68-8634-8dc89f12fe01 |         1 | 2014-03-18 09:11:16 |
>>NULL          | NULL        |           0 | DomainRouter       |
>>2014-03-17 11:28:50 | 2014-03-18 08:11:16 | NULL        | NULL     |
>>| 30 | cabd4cd9-c39f-423f-ad6a-ee3affe0bd9d |          19 | NULL
>>    | NULL          | NULL        | NULL        | NULL    | NULL
>>|        202 | Static | Allocated    | Start    | ControlNetworkGuru |
>>e81ba56d-a101-4c60-b44f-a0890d56aad9 |         1 | 2014-03-18 09:11:44 |
>>NULL          | NULL        |           0 | DomainRouter       |
>>2014-03-18 08:11:32 | NULL                | NULL        | NULL     |
>>+----+--------------------------------------+-------------+---------------
>>----+---------------+-------------+-------------+---------+---------------
>>+------------+--------+--------------+----------+--------------------+----
>>----------------------------------+-----------+---------------------+-----
>>----------+-------------+-------------+--------------------+--------------
>>-------+---------------------+-------------+----------+
>>
>>

Mime
View raw message