Return-Path: X-Original-To: apmail-cloudstack-issues-archive@www.apache.org Delivered-To: apmail-cloudstack-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5F65418FEA for ; Sat, 17 Oct 2015 16:00:06 +0000 (UTC) Received: (qmail 93092 invoked by uid 500); 17 Oct 2015 16:00:06 -0000 Delivered-To: apmail-cloudstack-issues-archive@cloudstack.apache.org Received: (qmail 93052 invoked by uid 500); 17 Oct 2015 16:00:06 -0000 Mailing-List: contact issues-help@cloudstack.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cloudstack.apache.org Delivered-To: mailing list issues@cloudstack.apache.org Received: (qmail 92879 invoked by uid 500); 17 Oct 2015 16:00:06 -0000 Delivered-To: apmail-incubator-cloudstack-issues@incubator.apache.org Received: (qmail 92805 invoked by uid 99); 17 Oct 2015 16:00:06 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 17 Oct 2015 16:00:06 +0000 Date: Sat, 17 Oct 2015 16:00:06 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: cloudstack-issues@incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CLOUDSTACK-8952) The redundant routers are facing a race condition due to several KeepaliveD/ConntrackD restarts MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CLOUDSTACK-8952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961951#comment-14961951 ] ASF GitHub Bot commented on CLOUDSTACK-8952: -------------------------------------------- Github user wilderrodrigues commented on the pull request: https://github.com/apache/cloudstack/pull/940#issuecomment-148926525 Hi @remibergsma @karuturi @miguelaferreira @wido @borisroman @bhaisaab @bvbharat Please have a look at this PR. == Hardware required tests == * Management Server + MySQL running on CentOS 7.1 * One KVM host running on CentOS 7.1 * ACS Agent and Common RPMs built from source ``` Create a redundant VPC with two networks with two VMs in each network ... === TestName: test_01_create_redundant_VPC_2tiers_4VMs_4IPs_4PF_ACL | Status : SUCCESS === ok Create a redundant VPC with two networks with two VMs in each network and check default routes ... === TestName: test_02_redundant_VPC_default_routes | Status : SUCCESS === ok Test iptables default INPUT/FORWARD policy on RouterVM ... === TestName: test_02_routervm_iptables_policies | Status : SUCCESS === ok Test iptables default INPUT/FORWARD policies on VPC router ... === TestName: test_01_single_VPC_iptables_policies | Status : SUCCESS === ok Create a VPC with two networks with one VM in each network and test nics after destroy ... === TestName: test_01_VPC_nics_after_destroy | Status : SUCCESS === ok Create a VPC with two networks with one VM in each network and test default routes ... === TestName: test_02_VPC_default_routes | Status : SUCCESS === ok Test to create Load balancing rule with source NAT ... === TestName: test_01_create_lb_rule_src_nat | Status : SUCCESS === ok Test to create Load balancing rule with non source NAT ... === TestName: test_02_create_lb_rule_non_nat | Status : SUCCESS === ok Test for assign & removing load balancing rule ... === TestName: test_assign_and_removal_lb | Status : SUCCESS === ok Stop existing router, add a PF rule and check we can access the VM ... === TestName: test_isolate_network_FW_PF_default_routes | Status : SUCCESS === ok Test redundant router internals ... === TestName: test_RVR_Network_FW_PF_SSH_default_routes | Status : SUCCESS === ok ---------------------------------------------------------------------- Ran 11 tests in 8295.897s OK ``` == No Hardware required tests == * Management Server + MySQL running on CentOS 7.1 * Two KVM hosts running on CentOS 7.1 * ACS Agent and Common RPMs built from source ``` Test router internal advanced zone ... === TestName: test_02_router_internal_adv | Status : SUCCESS === ok Test restart network ... === TestName: test_03_restart_network_cleanup | Status : SUCCESS === ok Test router basic setup ... === TestName: test_05_router_basic | Status : SUCCESS === ok Test router advanced setup ... === TestName: test_06_router_advanced | Status : SUCCESS === ok Test stop router ... === TestName: test_07_stop_router | Status : SUCCESS === ok Test start router ... === TestName: test_08_start_router | Status : SUCCESS === ok Test reboot router ... === TestName: test_09_reboot_router | Status : SUCCESS === ok test_privategw_acl (integration.smoke.test_privategw_acl.TestPrivateGwACL) ... === TestName: test_privategw_acl | Status : SUCCESS === ok Test VPN in VPC ... === TestName: test_vpc_remote_access_vpn | Status : SUCCESS === ok Test VPN in VPC ... === TestName: test_vpc_site2site_vpn | Status : SUCCESS === ok Test to create service offering ... === TestName: test_01_create_service_offering | Status : SUCCESS === ok Test to update existing service offering ... === TestName: test_02_edit_service_offering | Status : SUCCESS === ok Test to delete service offering ... === TestName: test_03_delete_service_offering | Status : SUCCESS === ok Test create VPC offering ... === TestName: test_01_create_vpc_offering | Status : SUCCESS === ok Test VPC offering without load balancing service ... === TestName: test_03_vpc_off_without_lb | Status : EXCEPTION === ERROR Test VPC offering without static NAT service ... === TestName: test_04_vpc_off_without_static_nat | Status : EXCEPTION === ERROR Test VPC offering without port forwarding service ... === TestName: test_05_vpc_off_without_pf | Status : EXCEPTION === ERROR KNOWN ISSUE => https://issues.apache.org/jira/browse/CLOUDSTACK-8935 Test VPC offering with invalid services ... === TestName: test_06_vpc_off_invalid_services | Status : SUCCESS === ok Test update VPC offering ... === TestName: test_07_update_vpc_off | Status : SUCCESS === ok Test list VPC offering ... === TestName: test_08_list_vpc_off | Status : SUCCESS === ok test_09_create_redundant_vpc_offering (integration.component.test_vpc_offerings.TestVPCOffering) ... === TestName: test_09_create_redundant_vpc_offering | Status : SUCCESS === ok Test start/stop of router after addition of one guest network ... === TestName: test_01_start_stop_router_after_addition_of_one_guest_network | Status : SUCCESS === ok Test reboot of router after addition of one guest network ... === TestName: test_02_reboot_router_after_addition_of_one_guest_network | Status : SUCCESS === ok Test to change service offering of router after addition of one guest network ... === TestName: test_04_chg_srv_off_router_after_addition_of_one_guest_network | Status : SUCCESS === ok Test destroy of router after addition of one guest network ... === TestName: test_05_destroy_router_after_addition_of_one_guest_network | Status : SUCCESS === ok Test to stop and start router after creation of VPC ... === TestName: test_01_stop_start_router_after_creating_vpc | Status : SUCCESS === ok Test to reboot the router after creating a VPC ... === TestName: test_02_reboot_router_after_creating_vpc | Status : SUCCESS === ok Tests to change service offering of the Router after ... === TestName: test_04_change_service_offerring_vpc | Status : SUCCESS === ok Test to destroy the router after creating a VPC ... === TestName: test_05_destroy_router_after_creating_vpc | Status : SUCCESS === ok Test advanced zone virtual router ... === TestName: test_advZoneVirtualRouter | Status : SUCCESS === ok Test Multiple Deploy Virtual Machine ... === TestName: test_deploy_vm_multiple | Status : SUCCESS === ok Test Stop Virtual Machine ... === TestName: test_01_stop_vm | Status : SUCCESS === ok Test Start Virtual Machine ... === TestName: test_02_start_vm | Status : SUCCESS === ok Test Reboot Virtual Machine ... === TestName: test_03_reboot_vm | Status : SUCCESS === ok Test destroy Virtual Machine ... === TestName: test_06_destroy_vm | Status : SUCCESS === ok Test recover Virtual Machine ... === TestName: test_07_restore_vm | Status : SUCCESS === ok Test migrate VM ... === TestName: test_08_migrate_vm | Status : SUCCESS === ok Test destroy(expunge) Virtual Machine ... === TestName: test_09_expunge_vm | Status : SUCCESS === ok Test reset virtual machine on reboot ... === TestName: test_01_reset_vm_on_reboot | Status : SUCCESS === ok ---------------------------------------------------------------------- Ran 40 tests in 8349.607s ``` * Manual Tests * Created Redundant Network Offering * Created 2 VMs * Acquired 1 new IP * Created LB rule * Opened FW ``` [wrodrigues@mct-wrodrigues-g9 ~]$ ssh root@192.168.23.8 The authenticity of host '192.168.23.8 (192.168.23.8)' can't be established. ECDSA key fingerprint is ca:ae:de:89:e8:10:39:9d:c6:ff:ad:b3:87:db:d4:57. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added '192.168.23.8' (ECDSA) to the list of known hosts. root@192.168.23.8's password: # ping 8.8.8.8 PING 8.8.8.8 (8.8.8.8): 56 data bytes 64 bytes from 8.8.8.8: seq=0 ttl=48 time=10.955 ms 64 bytes from 8.8.8.8: seq=1 ttl=48 time=10.861 ms ^C --- 8.8.8.8 ping statistics --- 2 packets transmitted, 2 packets received, 0% packet loss round-trip min/avg/max = 10.861/10.908/10.955 ms # ip addr 1: lo: mtu 65536 qdisc noqueue link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 02:00:4d:f9:00:03 brd ff:ff:ff:ff:ff:ff inet 10.1.1.67/24 brd 10.1.1.255 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::4dff:fef9:3/64 scope link valid_lft forever preferred_lft forever # exit Connection to 192.168.23.8 closed. [wrodrigues@mct-wrodrigues-g9 ~]$ rm -rf ~/.ssh/known_hosts [wrodrigues@mct-wrodrigues-g9 ~]$ ssh root@192.168.23.8 The authenticity of host '192.168.23.8 (192.168.23.8)' can't be established. ECDSA key fingerprint is 6f:b0:b4:2f:f7:fb:69:a8:5b:eb:0f:ed:97:07:6e:8c. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added '192.168.23.8' (ECDSA) to the list of known hosts. root@192.168.23.8's password: # ping 8.8.8.8 PING 8.8.8.8 (8.8.8.8): 56 data bytes 64 bytes from 8.8.8.8: seq=0 ttl=48 time=14.027 ms 64 bytes from 8.8.8.8: seq=1 ttl=48 time=10.417 ms ^C --- 8.8.8.8 ping statistics --- 2 packets transmitted, 2 packets received, 0% packet loss round-trip min/avg/max = 10.417/12.222/14.027 ms # ip addr 1: lo: mtu 65536 qdisc noqueue link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 02:00:03:4c:00:04 brd ff:ff:ff:ff:ff:ff inet 10.1.1.66/24 brd 10.1.1.255 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::3ff:fe4c:4/64 scope link valid_lft forever preferred_lft forever # ``` > The redundant routers are facing a race condition due to several KeepaliveD/ConntrackD restarts > ----------------------------------------------------------------------------------------------- > > Key: CLOUDSTACK-8952 > URL: https://issues.apache.org/jira/browse/CLOUDSTACK-8952 > Project: CloudStack > Issue Type: Bug > Security Level: Public(Anyone can view this level - this is the default.) > Components: Virtual Router > Affects Versions: 4.6.0 > Reporter: Wilder Rodrigues > Assignee: Wilder Rodrigues > Priority: Critical > Fix For: 4.6.0 > > > In the CsRedundant.py we have a line doing: > proc = CsProcess(['/usr/sbin/keepalived', '--vrrp']) > However, the CsProcess cannot find a process with the string search "--vrrp", which makes it always return false and restart keepalived. > Due to the restart, the routers start a race condition to become master, which makes network features unavailable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)