cloudstack-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CLOUDSTACK-8952) The redundant routers are facing a race condition due to several KeepaliveD/ConntrackD restarts
Date Sat, 17 Oct 2015 16:00:06 GMT

    [ https://issues.apache.org/jira/browse/CLOUDSTACK-8952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961951#comment-14961951
] 

ASF GitHub Bot commented on CLOUDSTACK-8952:
--------------------------------------------

Github user wilderrodrigues commented on the pull request:

    https://github.com/apache/cloudstack/pull/940#issuecomment-148926525
  
    Hi @remibergsma @karuturi @miguelaferreira @wido @borisroman @bhaisaab @bvbharat 
    
    Please have a look at this PR.
    
    == Hardware required tests ==
    
    * Management Server + MySQL running on CentOS 7.1
    * One KVM host running on CentOS 7.1
    * ACS Agent and Common RPMs built from source
    
    ```
    Create a redundant VPC with two networks with two VMs in each network ... === TestName:
test_01_create_redundant_VPC_2tiers_4VMs_4IPs_4PF_ACL | Status : SUCCESS ===
    ok
    Create a redundant VPC with two networks with two VMs in each network and check default
routes ... === TestName: test_02_redundant_VPC_default_routes | Status : SUCCESS ===
    ok
    Test iptables default INPUT/FORWARD policy on RouterVM ... === TestName: test_02_routervm_iptables_policies
| Status : SUCCESS ===
    ok
    Test iptables default INPUT/FORWARD policies on VPC router ... === TestName: test_01_single_VPC_iptables_policies
| Status : SUCCESS ===
    ok
    Create a VPC with two networks with one VM in each network and test nics after destroy
... === TestName: test_01_VPC_nics_after_destroy | Status : SUCCESS ===
    ok
    Create a VPC with two networks with one VM in each network and test default routes ...
=== TestName: test_02_VPC_default_routes | Status : SUCCESS ===
    ok
    Test to create Load balancing rule with source NAT ... === TestName: test_01_create_lb_rule_src_nat
| Status : SUCCESS ===
    ok
    Test to create Load balancing rule with non source NAT ... === TestName: test_02_create_lb_rule_non_nat
| Status : SUCCESS ===
    ok
    Test for assign & removing load balancing rule ... === TestName: test_assign_and_removal_lb
| Status : SUCCESS ===
    ok
    Stop existing router, add a PF rule and check we can access the VM ... === TestName: test_isolate_network_FW_PF_default_routes
| Status : SUCCESS ===
    ok
    Test redundant router internals ... === TestName: test_RVR_Network_FW_PF_SSH_default_routes
| Status : SUCCESS ===
    ok
    
    ----------------------------------------------------------------------
    Ran 11 tests in 8295.897s
    
    OK
    ```
    
    == No Hardware required tests ==
    
    * Management Server + MySQL running on CentOS 7.1
    * Two KVM hosts running on CentOS 7.1
    * ACS Agent and Common RPMs built from source
    
    ```
    Test router internal advanced zone ... === TestName: test_02_router_internal_adv | Status
: SUCCESS ===
    ok
    Test restart network ... === TestName: test_03_restart_network_cleanup | Status : SUCCESS
===
    ok
    Test router basic setup ... === TestName: test_05_router_basic | Status : SUCCESS ===
    ok
    Test router advanced setup ... === TestName: test_06_router_advanced | Status : SUCCESS
===
    ok
    Test stop router ... === TestName: test_07_stop_router | Status : SUCCESS ===
    ok
    Test start router ... === TestName: test_08_start_router | Status : SUCCESS ===
    ok
    Test reboot router ... === TestName: test_09_reboot_router | Status : SUCCESS ===
    ok
    test_privategw_acl (integration.smoke.test_privategw_acl.TestPrivateGwACL) ... === TestName:
test_privategw_acl | Status : SUCCESS ===
    ok
    Test VPN in VPC ... === TestName: test_vpc_remote_access_vpn | Status : SUCCESS ===
    ok
    Test VPN in VPC ... === TestName: test_vpc_site2site_vpn | Status : SUCCESS ===
    ok
    Test to create service offering ... === TestName: test_01_create_service_offering | Status
: SUCCESS ===
    ok
    Test to update existing service offering ... === TestName: test_02_edit_service_offering
| Status : SUCCESS ===
    ok
    Test to delete service offering ... === TestName: test_03_delete_service_offering | Status
: SUCCESS ===
    ok
    Test create VPC offering ... === TestName: test_01_create_vpc_offering | Status : SUCCESS
===
    ok
    
    
    Test VPC offering without load balancing service ... === TestName: test_03_vpc_off_without_lb
| Status : EXCEPTION ===
    ERROR
    Test VPC offering without static NAT service ... === TestName: test_04_vpc_off_without_static_nat
| Status : EXCEPTION ===
    ERROR
    Test VPC offering without port forwarding service ... === TestName: test_05_vpc_off_without_pf
| Status : EXCEPTION ===
    ERROR
    
    KNOWN ISSUE => https://issues.apache.org/jira/browse/CLOUDSTACK-8935
    
    
    Test VPC offering with invalid services ... === TestName: test_06_vpc_off_invalid_services
| Status : SUCCESS ===
    ok
    Test update VPC offering ... === TestName: test_07_update_vpc_off | Status : SUCCESS ===
    ok
    Test list VPC offering ... === TestName: test_08_list_vpc_off | Status : SUCCESS ===
    ok
    test_09_create_redundant_vpc_offering (integration.component.test_vpc_offerings.TestVPCOffering)
... === TestName: test_09_create_redundant_vpc_offering | Status : SUCCESS ===
    ok
    Test start/stop of router after addition of one guest network ... === TestName: test_01_start_stop_router_after_addition_of_one_guest_network
| Status : SUCCESS ===
    ok
    Test reboot of router after addition of one guest network ... === TestName: test_02_reboot_router_after_addition_of_one_guest_network
| Status : SUCCESS ===
    ok
    Test to change service offering of router after addition of one guest network ... ===
TestName: test_04_chg_srv_off_router_after_addition_of_one_guest_network | Status : SUCCESS
===
    ok
    Test destroy of router after addition of one guest network ... === TestName: test_05_destroy_router_after_addition_of_one_guest_network
| Status : SUCCESS ===
    ok
    Test to stop and start router after creation of VPC ... === TestName: test_01_stop_start_router_after_creating_vpc
| Status : SUCCESS ===
    ok
    Test to reboot the router after creating a VPC ... === TestName: test_02_reboot_router_after_creating_vpc
| Status : SUCCESS ===
    ok
    Tests to change service offering of the Router after ... === TestName: test_04_change_service_offerring_vpc
| Status : SUCCESS ===
    ok
    Test to destroy the router after creating a VPC ... === TestName: test_05_destroy_router_after_creating_vpc
| Status : SUCCESS ===
    ok
    Test advanced zone virtual router ... === TestName: test_advZoneVirtualRouter | Status
: SUCCESS ===
    ok
    Test Multiple Deploy Virtual Machine ... === TestName: test_deploy_vm_multiple | Status
: SUCCESS ===
    ok
    Test Stop Virtual Machine ... === TestName: test_01_stop_vm | Status : SUCCESS ===
    ok
    Test Start Virtual Machine ... === TestName: test_02_start_vm | Status : SUCCESS ===
    ok
    Test Reboot Virtual Machine ... === TestName: test_03_reboot_vm | Status : SUCCESS ===
    ok
    Test destroy Virtual Machine ... === TestName: test_06_destroy_vm | Status : SUCCESS ===
    ok
    Test recover Virtual Machine ... === TestName: test_07_restore_vm | Status : SUCCESS ===
    ok
    Test migrate VM ... === TestName: test_08_migrate_vm | Status : SUCCESS ===
    ok
    Test destroy(expunge) Virtual Machine ... === TestName: test_09_expunge_vm | Status :
SUCCESS ===
    ok
    Test reset virtual machine on reboot ... === TestName: test_01_reset_vm_on_reboot | Status
: SUCCESS ===
    ok
    
    ----------------------------------------------------------------------
    Ran 40 tests in 8349.607s
    ```
    
    * Manual Tests
      * Created Redundant Network Offering
      * Created 2 VMs
      * Acquired 1 new IP
      * Created LB rule
      * Opened FW
    
    ```
    [wrodrigues@mct-wrodrigues-g9 ~]$ ssh root@192.168.23.8
    The authenticity of host '192.168.23.8 (192.168.23.8)' can't be established.
    ECDSA key fingerprint is ca:ae:de:89:e8:10:39:9d:c6:ff:ad:b3:87:db:d4:57.
    Are you sure you want to continue connecting (yes/no)? yes
    Warning: Permanently added '192.168.23.8' (ECDSA) to the list of known hosts.
    root@192.168.23.8's password: 
    # ping 8.8.8.8
    PING 8.8.8.8 (8.8.8.8): 56 data bytes
    64 bytes from 8.8.8.8: seq=0 ttl=48 time=10.955 ms
    64 bytes from 8.8.8.8: seq=1 ttl=48 time=10.861 ms
    ^C
    --- 8.8.8.8 ping statistics ---
    2 packets transmitted, 2 packets received, 0% packet loss
    round-trip min/avg/max = 10.861/10.908/10.955 ms
    # ip addr
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue 
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
           valid_lft forever preferred_lft forever
        inet6 ::1/128 scope host 
           valid_lft forever preferred_lft forever
    2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
        link/ether 02:00:4d:f9:00:03 brd ff:ff:ff:ff:ff:ff
        inet 10.1.1.67/24 brd 10.1.1.255 scope global eth0
           valid_lft forever preferred_lft forever
        inet6 fe80::4dff:fef9:3/64 scope link 
           valid_lft forever preferred_lft forever
    # exit
    Connection to 192.168.23.8 closed.
    [wrodrigues@mct-wrodrigues-g9 ~]$ rm -rf ~/.ssh/known_hosts 
    [wrodrigues@mct-wrodrigues-g9 ~]$ ssh root@192.168.23.8
    The authenticity of host '192.168.23.8 (192.168.23.8)' can't be established.
    ECDSA key fingerprint is 6f:b0:b4:2f:f7:fb:69:a8:5b:eb:0f:ed:97:07:6e:8c.
    Are you sure you want to continue connecting (yes/no)? yes
    Warning: Permanently added '192.168.23.8' (ECDSA) to the list of known hosts.
    root@192.168.23.8's password: 
    # ping 8.8.8.8
    PING 8.8.8.8 (8.8.8.8): 56 data bytes
    64 bytes from 8.8.8.8: seq=0 ttl=48 time=14.027 ms
    64 bytes from 8.8.8.8: seq=1 ttl=48 time=10.417 ms
    ^C
    --- 8.8.8.8 ping statistics ---
    2 packets transmitted, 2 packets received, 0% packet loss
    round-trip min/avg/max = 10.417/12.222/14.027 ms
    # ip addr
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue 
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
           valid_lft forever preferred_lft forever
        inet6 ::1/128 scope host 
           valid_lft forever preferred_lft forever
    2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
        link/ether 02:00:03:4c:00:04 brd ff:ff:ff:ff:ff:ff
        inet 10.1.1.66/24 brd 10.1.1.255 scope global eth0
           valid_lft forever preferred_lft forever
        inet6 fe80::3ff:fe4c:4/64 scope link 
           valid_lft forever preferred_lft forever
    # 
    ```


> The redundant routers are facing a race condition due to several KeepaliveD/ConntrackD
restarts
> -----------------------------------------------------------------------------------------------
>
>                 Key: CLOUDSTACK-8952
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-8952
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the default.) 
>          Components: Virtual Router
>    Affects Versions: 4.6.0
>            Reporter: Wilder Rodrigues
>            Assignee: Wilder Rodrigues
>            Priority: Critical
>             Fix For: 4.6.0
>
>
> In the CsRedundant.py we have a line doing:
> proc = CsProcess(['/usr/sbin/keepalived', '--vrrp'])
> However, the CsProcess cannot find a process with the string search "--vrrp", which makes
it always return false and restart keepalived.
> Due to the restart, the routers start a race condition to become master, which makes
network features unavailable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message