cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ilya <ilya.mailing.li...@gmail.com>
Subject Re: Roadmap for 4.x and 5.0
Date Tue, 05 Jul 2016 22:43:42 GMT
Marc

You are correct that my shell script is not most robust - it should  be
re-written in java - and called upon on "graceful" shutdown - this
script should be treated as POC i guess.

What it guards against - is more than just snapshots though. Basically -
any async operation that would be harmful to end user experience if i
was to take down one of the MS servers.

I front my MS servers with a VIP, as i take down one of the MS servers
gracefully via script below, the agents all reconnect to next MS.



The current "Cold Cross Cluster" migration as it stands is done by
copying the data disk to secondary and then back to primary. If you have
a VMs with 4TB data disks - thats not feasible for several reasons (1
NFS export for SSVM may not be as large, its pretty slow to copy to NFS
and back to Primary - even if you have a robust network). Hence direct
migration bypassing the secondary store would be far more efficient.

In regards to secure KVM migration, each migrate call, establishes a
one-time SSH key pair between 2 KVM host that will be used only for the
duration of that migration. It is cleared once the operation completes
and avoids a possibility of someone exploiting the cloud user ssh keys.

This is not a big deal to Cloud Hosting companies - but is a big deal to
enterprise security folks who run cloudstack as private cloud. We don't
want cloud user keys littered everywhere - not very ideal in terms of
security.

Regards
ilya



On 7/3/16 10:41 PM, marco@exoscale.ch wrote:
> Hi Ilya,
> 
> Regarding the live migration, we are using it in production and did migrate a couple
of VMs until we reach some corner cases, for which I wrote a few fixes. We'll verify them
during the following weeks. The code is based on CS 4.4 but I started porting it to master.
I have to finish that and merge the fixes too. For the cold migration, it's already in CS
and we are usign it since a while.
> What do you mean by secure KVM migration? My code reads configuration values for which
you can have TLS peer-2-peer connection between the agents to transfert over it all the data
using the features in libvirt. That the setup we have in production.
> 
> For the graceful shutdown, we have a HA proxy in front so we just edit the configuration
to turn off one MS. We are also checking manually if there aren't any snapshot ongoing before
launching the stop-start. But I don't find this very robust. Therefore I read a lot of the
code managing the agent and how the agents are connected to the MS. There is already a command
to rebalance agents between MS, so I'm developping a solution around that.
> 
> Kind regards,
> Marc-Aurèle
> 
> 
>> On 02 Jul 2016, at 02:03, ilya <ilya.mailing.lists@gmail.com> wrote:
>>
>> Marco,
>>
>> I written a tiny shell script that does following:
>>
>> Make's sure there are async_jobs that arent running, also block 8080 via
>> iptables - to avoid user connecting to MS thats about to go down.
>>
>> It needs a bit of enhancement - and should lookup the MSID of that
>> specific server, it looks something like this - consider borrowing
>> concepts if applicable..
>>
>>> #!/bin/bash
>>> DATESTAMP=$(date +%m%d%y-%H%M%S)
>>> DBPASS=$(java -classpath /usr/share/cloudstack-common/lib/jasypt-1.9.0.jar org.jasypt.intf.cli.JasyptPBEStringDecryptionCLI
input="$(cat /etc/cloudstack/management/db.properties | grep db.cloud.password | awk -F'('
'{print $2}' | sed 's/)//g')" password="$(cat /etc/cloudstack/management/key)" | grep -A2
OUTPUT | tail -1)
>>> DBHOST=$(cat /etc/cloudstack/management/db.properties | grep db.cloud.host |
awk -F'=' '{print $2}' | tail -1 )
>>> DBUSER=$(cat /etc/cloudstack/management/db.properties | grep db.cloud.username
| awk -F'=' '{print $2}')
>>> DB=$(cat /etc/cloudstack/management/db.properties | grep db.cloud.name | awk
-F'=' '{print $2}')
>>> DBPORT=$(cat /etc/cloudstack/management/db.properties | grep db.cloud.port |
awk -F'=' '{print $2}')
>>> MYSQLCMD="mysql -h $DBHOST -u $DBUSER -P $DBPORT -p$DBPASS $DB"
>>> #echo $DBPASS $DBHOST $DBUSER $DB $DBPORT
>>>
>>>
>>> JOBS=$(echo 'SELECT * FROM cloud.async_job where job_status=0 and job_dispatcher
not like "pseudoJobDispatcher"' | $MYSQLCMD | wc -l)
>>>
>>> if [ $JOBS -gt 0 ]
>>>        then
>>>                echo "WARN: Looks like i have active jobs in flight, please try
again later"
>>>                echo 'SELECT * FROM cloud.async_job where job_status=0 and job_dispatcher
not like "pseudoJobDispatcher"' | $MYSQLCMD
>>>                exit
>>>        else
>>>                echo "NOTE: No jobs running, good to go!"
>>>                echo "NOTE: Blocking incoming 8080"
>>>                /sbin/iptables -A INPUT -p tcp --destination-port 8080 -j DROP
>>>                service cloudstack-management stop
>>>                service cloudstack-management stop:wq
>>>                CSPID=$(cat /var/run/cloudstack-management.pid )
>>>                ps -p $CSPID >/dev/null 2>&1 && (kill -9 $CSPID)
>>>                ps -p $CSPID >/dev/null 2>&1 && (echo "ERROR:
Count not terminame cloudstack service on `hostname` with pid $SCPID"; /sbin/iptables -D INPUT
-p tcp --destination-port 8080 -j DROP; exit 1)
>>>                service cloudstack-management start
>>>                echo "NOTE: Unblocking incoming 8080"
>>>                /sbin/iptables -D INPUT -p tcp --destination-port 8080 -j DROP
>>> fi
>>
>> Regards,
>> ilya
>>
>> On 7/1/16 3:30 AM, marco@exoscale.ch wrote:
>>> Hi,
>>>
>>> I can't edit the page but I'll be glad to put some effort for the V5:
>>> - Live migration for KVM
>>> - Improve logging using UUIDs (as I already did part of that for us at exoscale)
>>>
>>> I'm in the process to add another feature we need: graceful shutdown of a management
server when running a cluster of MS. The goal is to send a "prepareForShutdown" command to
one or more MS and have them rebalance their agents to the ones still running so that no command
will be lost. Then there shouldn't be any downtime with any agent during an update.
>>>
>>> Kind regards,
>>> Marc-Aurèle
>>>
>>> PS: Is there any architectural discussion going on on the Slack channel? I saw
that the IRC is not so active...
>>>
>>>
>>>> On 01 Jul 2016, at 11:55, Paul Angus <paul.angus@shapeblue.com> wrote:
>>>>
>>>> There's not been much response to this, but I'll start clearing away the
unclaimed items, people can always add them back.
>>>>
>>>>
>>>> Kind regards,
>>>>
>>>> Paul Angus
>>>>
>>>>
>>>> paul.angus@shapeblue.com 
>>>> www.shapeblue.com
>>>> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
>>>> @shapeblue
>>>>
>>>>
>>>>
>>>
> 

Mime
View raw message