Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 48E6E200B4A for ; Wed, 6 Jul 2016 00:43:51 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 478A3160A6F; Tue, 5 Jul 2016 22:43:51 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 652C3160A60 for ; Wed, 6 Jul 2016 00:43:50 +0200 (CEST) Received: (qmail 49225 invoked by uid 500); 5 Jul 2016 22:43:49 -0000 Mailing-List: contact dev-help@cloudstack.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cloudstack.apache.org Delivered-To: mailing list dev@cloudstack.apache.org Received: (qmail 49213 invoked by uid 99); 5 Jul 2016 22:43:49 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Jul 2016 22:43:49 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id B28E1C00EA for ; Tue, 5 Jul 2016 22:43:48 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.802 X-Spam-Level: X-Spam-Status: No, score=-0.802 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx2-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id EhFQZxMEQSIQ for ; Tue, 5 Jul 2016 22:43:46 +0000 (UTC) Received: from mail-pa0-f53.google.com (mail-pa0-f53.google.com [209.85.220.53]) by mx2-lw-us.apache.org (ASF Mail Server at mx2-lw-us.apache.org) with ESMTPS id 1C4535F1F6 for ; Tue, 5 Jul 2016 22:43:46 +0000 (UTC) Received: by mail-pa0-f53.google.com with SMTP id bz2so70982734pad.1 for ; Tue, 05 Jul 2016 15:43:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-transfer-encoding; bh=1GLB7ma9ltmk9vjGhDcfyOFSAf3eHfqQZeJ9CUsZlUI=; b=pf7Ut8ExHj+QIFwf3UnqH+Wnj/ZlREhlPq40h7rANhnRKN4/w8P7jH+lkdSG0qEVnk AM1iEc0ZmXo6MeG6RTN7pCQHMfq9fvWbgfWtLn84mtYAy+zNbf/4dWIcZH8aX2QsRS8L wRAp/iBbidcEHIh3A+C8ohCZA7j0a1+hire57t0CHov+ewXgAVI+nk9Xq3DHdP0uI7FT YPcDzaqlT8oGvtm02HR7/PtMnbQkYmjKbD8RSdcV+hjZuj3pPQ9lDZrajzvQ2iwcOplr HfS2AqzEGPM0lYqu159iB9G776MeOqoEad7ESlUfolyYu8iMjZ849/ugq574B3PD49DF MJCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=1GLB7ma9ltmk9vjGhDcfyOFSAf3eHfqQZeJ9CUsZlUI=; b=TNTP5ZFQPdrE1/zgM6yzgBg/EI3Od9fsBDqomUjPtfDHkNxRiHKjhcp470Wj8sUqRo GsTtMA/9FKjBRjaRUgbaW9tW4GvaDEIyc/Z82e0u2f3eY/z9FrNvHSbn2UQX6f0qbBSd xQ7Tl0VqH9uZDxU3TTt9VJh0lY7gTIW7dTbQkDj197/hOJTkpRI2fURYElI3zHqPZpLx /XV7s2BuhT7ANfeI6XM4pXg9OZcjN9ZsWvgFoms75t2AOFfkweSBa/hwFgNQwa+2uUZx WOMzN6wOuAyxRloB0Q+f2V33qYxnK/4ZN/ptW7FMzec6+CMreQW4KEWarlu5CXM4E7M2 PIYA== X-Gm-Message-State: ALyK8tLiGE075TFTCh5Onj8rKioA/fzXrUa3Ot/RMIcNsIj1n9/bxW1OANrGa/U/wxoIJw== X-Received: by 10.66.151.71 with SMTP id uo7mr36359479pab.134.1467758623598; Tue, 05 Jul 2016 15:43:43 -0700 (PDT) Received: from [0.0.0.0] (dev1.cloudsand.com. [162.243.147.22]) by smtp.gmail.com with ESMTPSA id h189sm7141951pfc.52.2016.07.05.15.43.42 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 05 Jul 2016 15:43:42 -0700 (PDT) Subject: Re: Roadmap for 4.x and 5.0 To: dev@cloudstack.apache.org References: <6369CE02-374C-4912-8A08-5BFF8626CA9B@exoscale.ch> <1ecc42d4-218f-35b5-18bf-6f2ff8ecdc86@gmail.com> <01373C59-D8FD-4A26-85E3-1DE33D1270A4@exoscale.ch> From: ilya Message-ID: Date: Tue, 5 Jul 2016 15:43:42 -0700 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 MIME-Version: 1.0 In-Reply-To: <01373C59-D8FD-4A26-85E3-1DE33D1270A4@exoscale.ch> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit archived-at: Tue, 05 Jul 2016 22:43:51 -0000 Marc You are correct that my shell script is not most robust - it should be re-written in java - and called upon on "graceful" shutdown - this script should be treated as POC i guess. What it guards against - is more than just snapshots though. Basically - any async operation that would be harmful to end user experience if i was to take down one of the MS servers. I front my MS servers with a VIP, as i take down one of the MS servers gracefully via script below, the agents all reconnect to next MS. The current "Cold Cross Cluster" migration as it stands is done by copying the data disk to secondary and then back to primary. If you have a VMs with 4TB data disks - thats not feasible for several reasons (1 NFS export for SSVM may not be as large, its pretty slow to copy to NFS and back to Primary - even if you have a robust network). Hence direct migration bypassing the secondary store would be far more efficient. In regards to secure KVM migration, each migrate call, establishes a one-time SSH key pair between 2 KVM host that will be used only for the duration of that migration. It is cleared once the operation completes and avoids a possibility of someone exploiting the cloud user ssh keys. This is not a big deal to Cloud Hosting companies - but is a big deal to enterprise security folks who run cloudstack as private cloud. We don't want cloud user keys littered everywhere - not very ideal in terms of security. Regards ilya On 7/3/16 10:41 PM, marco@exoscale.ch wrote: > Hi Ilya, > > Regarding the live migration, we are using it in production and did migrate a couple of VMs until we reach some corner cases, for which I wrote a few fixes. We'll verify them during the following weeks. The code is based on CS 4.4 but I started porting it to master. I have to finish that and merge the fixes too. For the cold migration, it's already in CS and we are usign it since a while. > What do you mean by secure KVM migration? My code reads configuration values for which you can have TLS peer-2-peer connection between the agents to transfert over it all the data using the features in libvirt. That the setup we have in production. > > For the graceful shutdown, we have a HA proxy in front so we just edit the configuration to turn off one MS. We are also checking manually if there aren't any snapshot ongoing before launching the stop-start. But I don't find this very robust. Therefore I read a lot of the code managing the agent and how the agents are connected to the MS. There is already a command to rebalance agents between MS, so I'm developping a solution around that. > > Kind regards, > Marc-Aurèle > > >> On 02 Jul 2016, at 02:03, ilya wrote: >> >> Marco, >> >> I written a tiny shell script that does following: >> >> Make's sure there are async_jobs that arent running, also block 8080 via >> iptables - to avoid user connecting to MS thats about to go down. >> >> It needs a bit of enhancement - and should lookup the MSID of that >> specific server, it looks something like this - consider borrowing >> concepts if applicable.. >> >>> #!/bin/bash >>> DATESTAMP=$(date +%m%d%y-%H%M%S) >>> DBPASS=$(java -classpath /usr/share/cloudstack-common/lib/jasypt-1.9.0.jar org.jasypt.intf.cli.JasyptPBEStringDecryptionCLI input="$(cat /etc/cloudstack/management/db.properties | grep db.cloud.password | awk -F'(' '{print $2}' | sed 's/)//g')" password="$(cat /etc/cloudstack/management/key)" | grep -A2 OUTPUT | tail -1) >>> DBHOST=$(cat /etc/cloudstack/management/db.properties | grep db.cloud.host | awk -F'=' '{print $2}' | tail -1 ) >>> DBUSER=$(cat /etc/cloudstack/management/db.properties | grep db.cloud.username | awk -F'=' '{print $2}') >>> DB=$(cat /etc/cloudstack/management/db.properties | grep db.cloud.name | awk -F'=' '{print $2}') >>> DBPORT=$(cat /etc/cloudstack/management/db.properties | grep db.cloud.port | awk -F'=' '{print $2}') >>> MYSQLCMD="mysql -h $DBHOST -u $DBUSER -P $DBPORT -p$DBPASS $DB" >>> #echo $DBPASS $DBHOST $DBUSER $DB $DBPORT >>> >>> >>> JOBS=$(echo 'SELECT * FROM cloud.async_job where job_status=0 and job_dispatcher not like "pseudoJobDispatcher"' | $MYSQLCMD | wc -l) >>> >>> if [ $JOBS -gt 0 ] >>> then >>> echo "WARN: Looks like i have active jobs in flight, please try again later" >>> echo 'SELECT * FROM cloud.async_job where job_status=0 and job_dispatcher not like "pseudoJobDispatcher"' | $MYSQLCMD >>> exit >>> else >>> echo "NOTE: No jobs running, good to go!" >>> echo "NOTE: Blocking incoming 8080" >>> /sbin/iptables -A INPUT -p tcp --destination-port 8080 -j DROP >>> service cloudstack-management stop >>> service cloudstack-management stop:wq >>> CSPID=$(cat /var/run/cloudstack-management.pid ) >>> ps -p $CSPID >/dev/null 2>&1 && (kill -9 $CSPID) >>> ps -p $CSPID >/dev/null 2>&1 && (echo "ERROR: Count not terminame cloudstack service on `hostname` with pid $SCPID"; /sbin/iptables -D INPUT -p tcp --destination-port 8080 -j DROP; exit 1) >>> service cloudstack-management start >>> echo "NOTE: Unblocking incoming 8080" >>> /sbin/iptables -D INPUT -p tcp --destination-port 8080 -j DROP >>> fi >> >> Regards, >> ilya >> >> On 7/1/16 3:30 AM, marco@exoscale.ch wrote: >>> Hi, >>> >>> I can't edit the page but I'll be glad to put some effort for the V5: >>> - Live migration for KVM >>> - Improve logging using UUIDs (as I already did part of that for us at exoscale) >>> >>> I'm in the process to add another feature we need: graceful shutdown of a management server when running a cluster of MS. The goal is to send a "prepareForShutdown" command to one or more MS and have them rebalance their agents to the ones still running so that no command will be lost. Then there shouldn't be any downtime with any agent during an update. >>> >>> Kind regards, >>> Marc-Aurèle >>> >>> PS: Is there any architectural discussion going on on the Slack channel? I saw that the IRC is not so active... >>> >>> >>>> On 01 Jul 2016, at 11:55, Paul Angus wrote: >>>> >>>> There's not been much response to this, but I'll start clearing away the unclaimed items, people can always add them back. >>>> >>>> >>>> Kind regards, >>>> >>>> Paul Angus >>>> >>>> >>>> paul.angus@shapeblue.com >>>> www.shapeblue.com >>>> 53 Chandos Place, Covent Garden, London WC2N 4HSUK >>>> @shapeblue >>>> >>>> >>>> >>> >