cloudstack-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "venkata swamybabu budumuru (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CLOUDSTACK-4199) Redundant Virtual Router - no failover occur
Date Fri, 16 Aug 2013 08:57:47 GMT

    [ https://issues.apache.org/jira/browse/CLOUDSTACK-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13742042#comment-13742042
] 

venkata swamybabu budumuru commented on CLOUDSTACK-4199:
--------------------------------------------------------

I have also seen this issue every time during failover. Mentioned below are step to reproduce:

1. 1 advanced zone with KVM cluster (2 KVM hosts)
2. Create an offering with RVR enabled.

*************************** 15. row ***************************
                       id: 15
                     name: RVR
                     uuid: 4e91c49f-5870-43e1-9865-0a84cd7b72ae
              unique_name: RVR
             display_text: RVR
                  nw_rate: NULL
                  mc_rate: 10
             traffic_type: Guest
                     tags: NULL
              system_only: 0
             specify_vlan: 0
      service_offering_id: NULL
            conserve_mode: 1
                  created: 2013-08-16 05:05:34
                  removed: NULL
                  default: 0
             availability: Optional
     dedicated_lb_service: 1
shared_source_nat_service: 0
                 sort_key: 0
 redundant_router_service: 1       =========> RVR is enabled
                    state: Enabled
               guest_type: Isolated
       elastic_ip_service: 0
  eip_associate_public_ip: 0
       elastic_lb_service: 0
        specify_ip_ranges: 0
                   inline: 0
            is_persistent: 1   =====> Persistent is enabled.
              internal_lb: 0
                public_lb: 1
    egress_default_policy: 1
   concurrent_connections: NULL
15 rows in set (0.00 sec)

3. As a non-ROOT domain user, try to deploy a VM using the above network offering.

non-ROOT domain user info :

username : dom1User1
password : password
domain   : dom1

*************************** 20. row ***************************
                   id: 220
                 name: swamyRVRNetwork
                 uuid: 215f3f85-dca2-45e4-9cab-607654677575
         display_text: swamyRVRNetwork
         traffic_type: Guest
broadcast_domain_type: Vlan
        broadcast_uri: vlan://908
              gateway: 10.1.1.1
                 cidr: 10.1.1.0/24
                 mode: Dhcp
  network_offering_id: 15
  physical_network_id: 200
       data_center_id: 1
            guru_name: ExternalGuestNetworkGuru
                state: Implemented
              related: 220
            domain_id: 2
           account_id: 3
                 dns1: NULL
                 dns2: NULL
            guru_data: NULL
           set_fields: 0
             acl_type: Account
       network_domain: cs3auto.advanced
       reservation_id: c81b7838-db46-4d54-a5ed-4f6261802fb6
           guest_type: Isolated
     restart_required: 0
              created: 2013-08-16 07:30:48
              removed: NULL
    specify_ip_ranges: 0
               vpc_id: NULL
          ip6_gateway: NULL
             ip6_cidr: NULL
         network_cidr: NULL
      display_network: 1
       network_acl_id: NULL

*************************** 48. row ***************************
                  id: 48
                name: VM1Swamy
                uuid: 6bfe2221-74b7-4de6-9b46-ae2f5ea1a661
       instance_name: i-3-48-QA
               state: Running
      vm_template_id: 202
         guest_os_id: 112
 private_mac_address: 02:00:68:99:00:03
  private_ip_address: 10.1.1.23
              pod_id: 1
      data_center_id: 1
             host_id: 2
        last_host_id: 2
            proxy_id: NULL
   proxy_assign_time: NULL
        vnc_password: WFdUuz6e2W97XHGv7YnHc/8b0BH/HqK3eWpX3zxP97U=
          ha_enabled: 0
       limit_cpu_use: 0
        update_count: 3
         update_time: 2013-08-16 07:35:17
             created: 2013-08-16 07:33:25
             removed: NULL
                type: User
             vm_type: User
          account_id: 3
           domain_id: 2
 service_offering_id: 2
      reservation_id: 3baf28f3-745b-4dad-8fe9-8bab92bec033
     hypervisor_type: KVM
    disk_offering_id: NULL
                 cpu: NULL
                 ram: NULL
               owner: 3
               speed: 1000
           host_name: VM1Swamy
        display_name: VM1Swamy
       desired_state: NULL
dynamically_scalable: 0
          display_vm: 1

4. The above steps deployed RVR routers without any issues

*************************** 46. row ***************************
                  id: 46
                name: r-46-QA   =====================================> This became MASTER
                uuid: d044fae3-316e-4546-b832-ab9e12b074a3
       instance_name: r-46-QA
               state: Stopped
      vm_template_id: 3
         guest_os_id: 15
 private_mac_address: 0e:00:a9:fe:01:69
  private_ip_address: 169.254.1.105
              pod_id: 1
      data_center_id: 1
             host_id: NULL
        last_host_id: 3
            proxy_id: NULL
   proxy_assign_time: NULL
        vnc_password: eMTnIdbchG5GWMGzs5awGTGs4M7LuYjmLBlmCMMBLSw=
          ha_enabled: 0
       limit_cpu_use: 0
        update_count: 5
         update_time: 2013-08-16 07:41:43
             created: 2013-08-16 07:30:48
             removed: NULL
                type: DomainRouter
             vm_type: DomainRouter
          account_id: 3
           domain_id: 2
 service_offering_id: 7
      reservation_id: c70dbe54-8f26-40c0-a111-720b77d4a2c1
     hypervisor_type: KVM
    disk_offering_id: NULL
                 cpu: NULL
                 ram: NULL
               owner: NULL
               speed: NULL
           host_name: NULL
        display_name: NULL
       desired_state: NULL
dynamically_scalable: 0
          display_vm: 1

*************************** 47. row ***************************
                  id: 47
                name: r-47-QA  =====================================> This became BACKUP
                uuid: 49080daa-3a00-4967-94cf-594b42375e6e
       instance_name: r-47-QA
               state: Running
      vm_template_id: 3
         guest_os_id: 15
 private_mac_address: 0e:00:a9:fe:03:8d
  private_ip_address: 169.254.3.141
              pod_id: 1
      data_center_id: 1
             host_id: 2
        last_host_id: 2
            proxy_id: NULL
   proxy_assign_time: NULL
        vnc_password: UZ483zh1Nq/Ydq2mg1/v4I7mRqaSShk6vd6tWx84rQI=
          ha_enabled: 0
       limit_cpu_use: 0
        update_count: 7
         update_time: 2013-08-16 07:33:24
             created: 2013-08-16 07:30:49
             removed: NULL
                type: DomainRouter
             vm_type: DomainRouter
          account_id: 3
           domain_id: 2
 service_offering_id: 7
      reservation_id: 4ad79ebb-7c77-43b4-add2-fd3669d94d2f
     hypervisor_type: KVM
    disk_offering_id: NULL
                 cpu: NULL
                 ram: NULL
               owner: NULL
               speed: NULL
           host_name: NULL
        display_name: NULL
       desired_state: NULL
dynamically_scalable: 0
          display_vm: 1

5. Stop the MASTER VR from CloudStack

Observations:

(i) MASTER router went into stopped state successfully but, BACKUP router stuck in "FAULT"
state forever.

Here is the snippet of keepalived.log for FAULT router

root@r-47-QA:~# cat /ramdisk/rrouter/keepalived.log 
To backup called
Disable public ip 0
Password server is not running
Stopping DNS forwarder and DHCP server: dnsmasq(not running) ... (warning).
cache internal:
current active connections:	           0
connections created:		           0	failed:	           0
connections updated:		           0	failed:	           0
connections destroyed:		           0	failed:	           0

cache external:
current active connections:	           0
connections created:		           0	failed:	           0
connections updated:		           0	failed:	           0
connections destroyed:		           0	failed:	           0

traffic processed:
                   0 Bytes                         0 Pckts

multicast traffic (active device=eth0):
                   8 Bytes sent                    0 Bytes recv
                   1 Pckts sent                    0 Pckts recv
                   0 Error send                    0 Error recv

message tracking:
                   0 Malformed msgs                    0 Lost msgs

Conntrackd switch to backup done
Switch conntrackd mode backup 0
Status: BACKUP
To master called
ifdown: interface eth2 not configured
RTNETLINK answers: File exists
Failed to bring up eth2.
RTNETLINK answers: No such process
Enable public ip returned 2
Fail to enable public ip!
Password server is not running
Stopping DNS forwarder and DHCP server: dnsmasq(not running) ... (warning).
Stopping keepalived: keepalived.
Stopping conntrackd.
Status: FAULT (RTNETLINK answers: No such process)


Attaching the following logs to the bug along with mgmt server db dump.

- mgmt server log
- db dump
- MASTER (before reboot logs)
  * ifconfig output
  * ifconfig -a output
  * /ramdisk/rrouter/keepalived.log
  * checkrouter.sh output
- BACKUP (before and after failover)
  * ifconfig output
  * ifconfig -a output
  * /ramdisk/rrouter/keepalived.log
  * checkrouter.sh output


                
> Redundant Virtual Router - no failover occur
> --------------------------------------------
>
>                 Key: CLOUDSTACK-4199
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-4199
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the default.) 
>          Components: Management Server
>    Affects Versions: 4.2.0
>         Environment: MS        ACS 4.2  campo    internal build   341
> host       XS 6.2
>            Reporter: angeline shen
>            Priority: Critical
>             Fix For: 4.2.0
>
>         Attachments: management-server.log.gz, Screenshot-CloudPlatform™ - Mozilla
Firefox-3.png, Screenshot-CloudPlatform™ - Mozilla Firefox-4.png
>
>
> 1. create network offering  'egallowrvrnw1' with egress firewall policy : allow ,   
redundant router
>    advance zone.  create network of this offering.    create guest VMs
>    Verify ssh to VMs.  VMs can ping other VMs  in this network & reach internet
> 2. RVR  MASTER     r-37-VM
>    RVR  BACKUP      r-38-VM
>    stop  r-37-VM    
> Result:    r-37-VM    state becomes UNKNOWN
>               r-38-VM    state becomes  FAULT
>              no failover occur
>             Cannot  ssh to existing   VMs
> 3. start r-37-VM.
> Result:    r-37-VM    state becomes MASTER
>               r-38-VM    state remains   FAULT
>              VMs can reach other VMs in same network.   
>           VMs cannot reach internet
> 4. stop  r-37-VM
>               r-37-VM    state becomes UNKNOWN
>               r-38-VM    state becomes  FAULT
>              no failover occur
>             Cannot  ssh to existing   VMs
> r.VirtualNetworkApplianceManagerImpl] (RouterStatusMonitor-1:null) Found 1 networks to
update RvR status. 
> 2013-08-08 19:22:44,763 INFO  [network.router.VirtualNetworkApplianceManagerImpl] (RedundantRouterStatusMonitor-6:null)
Redundant virtual router (name: r-37-VM, id: 37)  just switch from MASTER to UNKNOWN
> 2013-08-08 19:22:44,768 DEBUG [agent.transport.Request] (RedundantRouterStatusMonitor-6:null)
Seq 1-2062888873: Sending  { Cmd , MgmtId: 7343890761426, via: 1, Ver: v1, Flags: 100011,
[{"com.cloud.agent.api.CheckRouterCommand":{"a
> ccessDetails":{"router.ip":"169.254.3.245","router.name":"r-38-VM"},"wait":30}}] }
> 2013-08-08 19:22:44,769 DEBUG [agent.transport.Request] (RedundantRouterStatusMonitor-6:null)
Seq 1-2062888873: Executing:  { Cmd , MgmtId: 7343890761426, via: 1, Ver: v1, Flags: 100011,
[{"com.cloud.agent.api.CheckRouterCommand":
> 2013-08-08 19:22:45,116 INFO  [network.router.VirtualNetworkApplianceManagerImpl] (RedundantRouterStatusMonitor-6:null)
Redundant virtual router (name: r-38-VM, id: 38)  just switch from BACKUP to FAULT
> 2013-08-08 19:22:45,344 DEBUG [agent.manager.DirectAgentAttache] (DirectAgent-270:null)
Seq 1-2062888874: Response Received: 
> 2013-08-08 19:22:45,345 DEBUG [agent.transport.Request] (DirectAgent-270:null) Seq 1-2062888874:
Processing:  { Ans: , MgmtId: 7343890761426, via: 1, Ver: v1, Flags: 10, [{"com.cloud.agent.api.CheckRouterAnswer":{"state":"FAULT","
> isBumped":false,"result":true,"details":"Status: FAULT (RTNETLINK answers: No such process)&Bumped:
NO","wait":0}}] }
> 2013-08-08 19:22:45,345 DEBUG [agent.transport.Request] (RedundantRouterStatusMonitor-6:null)
Seq 1-2062888874: Received:  { Ans: , MgmtId: 7343890761426, via: 1, Ver: v1, Flags: 10, {
CheckRouterAnswer } }
> 2013-08-08 19:22:45,345 DEBUG [agent.manager.AgentManagerImpl] (RedundantRouterStatusMonitor-6:null)
Details from executing class com.cloud.agent.api.CheckRouterCommand: Status: FAULT (RTNETLINK
answers: No such process)&Bumped: N
> O
> 2013-08-08 19:22:45,349 INFO  [network.router.VirtualNetworkApplianceManagerImpl] (RedundantRouterStatusMonitor-6:null)
Redundant virtual router (name: r-38-VM, id: 38)  just switch from BACKUP to FAULT
> 2013-08-08 19:22:46,781 DEBUG [agent.manager.AgentManagerImpl] (AgentManager-Handler-13:null)
Ping from 2
> 2013-08-08 19:22:47,125 DEBUG [agent.manager.AgentManagerImpl] (AgentManager-Handler-12:null)
Ping from 3
>   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message