cloudstack-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CLOUDSTACK-8616) Redundant VPR with both routers as Master
Date Tue, 14 Jul 2015 13:05:04 GMT

    [ https://issues.apache.org/jira/browse/CLOUDSTACK-8616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14626313#comment-14626313
] 

ASF GitHub Bot commented on CLOUDSTACK-8616:
--------------------------------------------

GitHub user wilderrodrigues opened a pull request:

    https://github.com/apache/cloudstack/pull/587

    CLOUDSTACK-8616: Redundant VPR with both routers as Master

    This PR contains some refactoring of the Python code used by the redundant routers and
also a fix for the intermittent problem when running the rVPC component tests.
    
    To summarise it:
    
    * If the KeepaloiveD configuration file changes, restart the service instead of reloading
it.
    * Since we are configuring KeepaliveD/VRRP in no-preemptive mode, we no longer need priorities.
As a matter of fact, the Management Server was not sending priorities to the routers anymore.
The value used in the old configuration was defaulted to 99 in the Python code.
    * KeepaliveD and ConntractD, once configured in a router, will have a cronjob that will
run on reboot. So, the services will be restarted without the need to wait for the management
server to send some configuration and force a restart.
    * Installing KeepaliveD from Wheezy-Backports in order to have a newer version available.
    
    I already squashed few commits of this PR so we wouldn't have to go through simple fixes/typos
that happened during the tryouts. When opening the commits for review please note that the
commit messages also contain the messages of the squashed commits.
    
    Adding the cronjob to restart the KeepaliveD service on reboot helped to get a 60% success
rate with the tests. Before that, the tests were failing very often: 4 out of 5 times.
    
    I then added the "restart" when configuration changes instead of "reload". Once the change
was applied, I successfully executed the tests 13 times. That gives confidence.
    
    Tests can be executed with the following command:
    
    nosetests --with-marvin --marvin-config=[your_configuration_file] -s -a tags=advanced,required_hardware=true
component/test_vpc_redundant.py
    
    Since there were changes on marvin/base.py - in the previous PR, you will need to build/upgrade
your Marvin installation.
    
    @DaanHoogland @bhaisaab @remibergsma, could you please have a look at this PR?
    
    Cheers,
    Wilder


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/schubergphilis/cloudstack fix/CLOUDSTACK-8616

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/cloudstack/pull/587.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #587
    
----
commit c35c6661696ab3c3c1ddfb6794bd293a76b2463b
Author: wilderrodrigues <wrodrigues@schubergphilis.com>
Date:   2015-07-08T05:24:35Z

    CLOUDSTACK-8616 - Removing the Priority form KeepaliveD configuration
    
       - We use no preempt mode with state set as EQUAL to both nodes, no need to have Priotities
setup
       - Do not add IPs as comments to the configuration. If a new guest interface is added,
the file will change anyway.
         - This was used in the past when keepalived would restart for each new interface
added
       - Removed the long sleep form the tests: we now sleep 5 seconds per PF rule added
    
    CLOUDSTACK-8616 - Fix keepalived.ts/2 files comparison
    
       - Add call to set_fault() in case of router transits to that state
       - Removing commented out code
    
    CLOUDSTACK-8616 - Fixing check_heartbeat.sh.templ
    
    CLOUDSTACK-8616 - Call set_fault from the check_heartbeat.sh script

commit c975185318cbfd00e9d5e346b4fc9ea2c76e8098
Author: wilderrodrigues <wrodrigues@schubergphilis.com>
Date:   2015-07-09T09:40:32Z

    CLOUDSTACK-8616 - Add keepalived start on reboot
    
       - Runs check_heartbeat.sh every 30 seconds
    
    CLOUDSTACK-861 - Copy/Paste error
    
       - Paste the wrong command in the crontab line.

commit c20b5f3ff1e56b4db296bd2ec46f0cd8ed538b29
Author: wilderrodrigues <wrodrigues@schubergphilis.com>
Date:   2015-07-10T06:41:28Z

    CLOUDSTACK-8616 - Installing KeepaliveD from Debian Wheezy backports
    
       - preempt delay reverted on version 1.2.13 - from the backports
         - vrrp : Revert "Honor preempt_delay setting on startup.".
         - See changelog: http://www.keepalived.org/changelog.html
       - Refactoring some variable names to avoid misunderstanding

commit 118d7b79f4f5f15a9931ff1cc6e2cc91a562ee11
Author: wilderrodrigues <wrodrigues@schubergphilis.com>
Date:   2015-07-13T17:29:41Z

    CLOUDSTACK-8616 - Add a cron job to restart ConntrackD on reboot

----


> Redundant VPR with both routers as Master
> -----------------------------------------
>
>                 Key: CLOUDSTACK-8616
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-8616
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the default.) 
>          Components: Virtual Router
>    Affects Versions: 4.6.0
>            Reporter: Wilder Rodrigues
>            Assignee: Wilder Rodrigues
>
> There is an intermittent problem with the keepalived on the redundant VPC routers. Sometimes
both routers stay on Master state for a while.
> We are able to reproduce it only when testing with Marvin, which executes the calls very
quick. When using the UI and following the same steps, it doesn't happen.
> Setting up:
> 1. Create a VPC using redundant VPC offering
> 2. Add 2 Tiers
> 3. Create 2 VMs in each Tier
> 4. Create ACLs to allow traffic on port 22 coming from 0.0.0.0/0
> 5. Acquire 4 public IPs
> 6. Create Port Forwarding rules - per IP - for port 22
> 7. Assign each PF created to one of the VMs
> 8. SSH to the VMs
> Testing fail over:
> 1. Stop the Master Router
> 2. Check the the Backup Router became Master
> 3. SSH to the VMs 
> Testing failure:
> 1. Delete all port forwarding rules
> 2. SSH to the VMs 
> 3. Verify that it no longer works
> Test recovering
> 1. Restart the router
> 2. Once the router is running, check that it's on Backup state
> 3. Add the port forwarding rules back
> 4. Verify that the routers are still on the same state: 1 Master and 1 Backup
>     - That's the part when it fails during the Marvin tests
>     - When 2 routers are on Master, by restarting 1 router will bring everything to a
normal state: 1 master and 1 backup
> 5. SSH to the VMs 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message