Return-Path: X-Original-To: apmail-cloudstack-dev-archive@www.apache.org Delivered-To: apmail-cloudstack-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4986E101E5 for ; Mon, 16 Feb 2015 20:40:13 +0000 (UTC) Received: (qmail 109 invoked by uid 500); 16 Feb 2015 20:40:12 -0000 Delivered-To: apmail-cloudstack-dev-archive@cloudstack.apache.org Received: (qmail 99954 invoked by uid 500); 16 Feb 2015 20:40:12 -0000 Mailing-List: contact dev-help@cloudstack.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cloudstack.apache.org Delivered-To: mailing list dev@cloudstack.apache.org Received: (qmail 99941 invoked by uid 99); 16 Feb 2015 20:40:12 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 Feb 2015 20:40:12 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,T_REMOTE_IMAGE X-Spam-Check-By: apache.org Received-SPF: unknown (athena.apache.org: error in processing during lookup of irae@cloudops.com) Received: from [209.85.214.180] (HELO mail-ob0-f180.google.com) (209.85.214.180) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 Feb 2015 20:40:08 +0000 Received: by mail-ob0-f180.google.com with SMTP id vb8so46065099obc.11 for ; Mon, 16 Feb 2015 12:39:03 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=CQwjUiQDV5K4PshXsbECQE302YwqFKqq6iRVqEkjbxI=; b=KhCFH0NldKw4Rr9R2nMljsQes/JLYkcUaBaTDkGQdFeD9SR3s26yr7UMkG1WqPNE+9 wV9Z8Xrgd9kTWeqNxcLArcSshn0qXtghs+unfOXr8gu5gfrstLcnmVoezOqIKSrLjsjg YfRqFtbDlIGDGybHmyOT0j7bTz3PInriaPSyZ8kYJlpo+xrwmwA32xNrPjU+MI4hPDSM dVODDWPEOw/6A73uLIFbOndJBokU5UkdaSFWnj92t9pSgNNVz3M0t/oa0kFHfK4IcQ0I xSjMOTDAYVZKqFTfozOCEfwnHQkrfOwXwExgmnsUpRUHNijLV0fbnazV827EQ9KzuB84 IPiQ== X-Gm-Message-State: ALoCoQn/3CqyuGkaT5NYcBGs3eTHCNgx3eEPVg9/8MMmmNSBpb72OoVnv6awGepFiCKljvrst3bi MIME-Version: 1.0 X-Received: by 10.202.190.6 with SMTP id o6mr15796552oif.28.1424119142952; Mon, 16 Feb 2015 12:39:02 -0800 (PST) Received: by 10.76.122.5 with HTTP; Mon, 16 Feb 2015 12:39:02 -0800 (PST) In-Reply-To: References: <5256976.469.1424085958415.JavaMail.andrei@tuchka> <30302814.527.1424088070198.JavaMail.andrei@tuchka> <54E206E7.5010600@widodh.nl> Date: Mon, 16 Feb 2015 15:39:02 -0500 Message-ID: Subject: Re: Your thoughts on using Primary Storage for keeping snapshots From: Ian Rae To: "dev@cloudstack.apache.org" Content-Type: multipart/alternative; boundary=001a113ddb80a24a4c050f3a93e3 X-Virus-Checked: Checked by ClamAV on apache.org --001a113ddb80a24a4c050f3a93e3 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Agree with Logan. As fans of Ceph as well as SolidFire, we are interested in seeing this particular use case (RBD/KVM) being well implemented, however the concept of volume snapshots residing only on primary storage vs being transferred to secondary storage is a more generally useful one that is worth solving with the same terminology and interfaces, even if the mechanisms may be specific to the storage type and hypervisor. It its not practical then its not practical, but seems like it would be worth trying. On Mon, Feb 16, 2015 at 1:02 PM, Logan Barfield wrote: > Hi Mike, > > I agree it is a general CloudStack issue that can be addressed across > multiple primary storage options. It's a two stage issue since some > changes will need to be implemented to support these features across > the board, and others will need to be made to each storage option. > > It would be nice to see a single issue opened to cover this across all > available storage options. Maybe have a community vote on what > support they want to see, and not consider the feature complete until > all of the desired options are implemented? That would slow down > development for sure, but it would ensure that it was supported where > it needs to be. > > Thank You, > > Logan Barfield > Tranquil Hosting > > > On Mon, Feb 16, 2015 at 12:42 PM, Mike Tutkowski > wrote: > > For example, Punith from CloudByte sent out an e-mail yesterday that wa= s > > very similar to this thread, but he was wondering how to implement such= a > > concept on his company's SAN technology. > > > > On Mon, Feb 16, 2015 at 10:40 AM, Mike Tutkowski < > > mike.tutkowski@solidfire.com> wrote: > > > >> Yeah, I think it's a similar concept, though. > >> > >> You would want to take snapshots on Ceph (or some other backend system > >> that acts as primary storage) instead of copying data to secondary > storage > >> and calling it a snapshot. > >> > >> For Ceph or any other backend system like that, the idea is to speed u= p > >> snapshots by not requiring CPU cycles on the front end or network > bandwidth > >> to transfer the data. > >> > >> In that sense, this is a general-purpose CloudStack problem and it > appears > >> you are intending on discussing only the Ceph implementation here, > which is > >> fine. > >> > >> On Mon, Feb 16, 2015 at 10:34 AM, Logan Barfield < > lbarfield@tqhosting.com> > >> wrote: > >> > >>> Hi Mike, > >>> > >>> I think the interest in this issue is primarily for Ceph RBD, which > >>> doesn't use iSCSI or SAN concepts in general. As well I believe RBD > >>> is only currently supported in KVM (and VMware?). QEMU has native RB= D > >>> support, so it attaches the devices directly to the VMs in question. > >>> It also natively supports snapshotting, which is what this discussion > >>> is about. > >>> > >>> Thank You, > >>> > >>> Logan Barfield > >>> Tranquil Hosting > >>> > >>> > >>> On Mon, Feb 16, 2015 at 11:46 AM, Mike Tutkowski > >>> wrote: > >>> > I should have also commented on KVM (since that was the hypervisor > >>> called > >>> > out in the initial e-mail). > >>> > > >>> > In my situation, most of my customers use XenServer and/or ESXi, so > KVM > >>> has > >>> > received the fewest of my cycles with regards to those three > >>> hypervisors. > >>> > > >>> > KVM, though, is actually the simplest hypervisor for which to > implement > >>> > these changes (since I am using the iSCSI adapter of the KVM agent > and > >>> it > >>> > just essentially passes my LUN to the VM in question). > >>> > > >>> > For KVM, there is no clustered file system applied to my backend LU= N, > >>> so I > >>> > don't have to "worry" about that layer. > >>> > > >>> > I don't see any hurdles like *immutable* UUIDs of SRs and VDIs (suc= h > is > >>> the > >>> > case with XenServer) or having to re-signature anything (such is th= e > >>> case > >>> > with ESXi). > >>> > > >>> > On Mon, Feb 16, 2015 at 9:33 AM, Mike Tutkowski < > >>> > mike.tutkowski@solidfire.com> wrote: > >>> > > >>> >> I have been working on this on and off for a while now (as time > >>> permits). > >>> >> > >>> >> Here is an e-mail I sent to a customer of ours that helps describe > >>> some of > >>> >> the issues: > >>> >> > >>> >> *** Beginning of e-mail *** > >>> >> > >>> >> The main requests were around the following features: > >>> >> > >>> >> * The ability to leverage SolidFire snapshots. > >>> >> > >>> >> * The ability to create CloudStack templates from SolidFire > snapshots. > >>> >> > >>> >> I had these on my roadmap, but bumped the priority up and began > work on > >>> >> them for the CS 4.6 release. > >>> >> > >>> >> During design, I realized there were issues with the way XenServer > is > >>> >> architected that prevented me from directly using SolidFire > snapshots. > >>> >> > >>> >> I could definitely take a SolidFire snapshot of a SolidFire volume= , > but > >>> >> this snapshot would not be usable from XenServer if the original > >>> volume was > >>> >> still in use. > >>> >> > >>> >> Here is the gist of the problem: > >>> >> > >>> >> When XenServer leverages an iSCSI target such as a SolidFire > volume, it > >>> >> applies a clustered files system to it, which they call a storage > >>> >> repository (SR). An SR has an *immutable* UUID associated with it. > >>> >> > >>> >> The virtual volume (which a VM sees as a disk) is represented by a > >>> virtual > >>> >> disk image (VDI) in the SR. A VDI also has an *immutable* UUID > >>> associated > >>> >> with it. > >>> >> > >>> >> If I take a snapshot (or a clone) of the SolidFire volume and then > >>> later > >>> >> try to use that snapshot from XenServer, XenServer complains that > the > >>> SR on > >>> >> the snapshot has a UUID that conflicts with an existing UUID. > >>> >> > >>> >> In other words, it is not possible to use the original SR and the > >>> snapshot > >>> >> of this SR from XenServer at the same time, which is critical in a > >>> cloud > >>> >> environment (to enable creating templates from snapshots). > >>> >> > >>> >> The way I have proposed circumventing this issue is not ideal, but > >>> >> technically works (this code is checked into the CS 4.6 branch): > >>> >> > >>> >> When the time comes to take a CloudStack snapshot of a CloudStack > >>> volume > >>> >> that is backed by SolidFire storage via the storage plug-in, the > >>> plug-in > >>> >> will create a new SolidFire volume with characteristics (size and > IOPS) > >>> >> equal to those of the original volume. > >>> >> > >>> >> We then have XenServer attach to this new SolidFire volume, create= a > >>> *new* > >>> >> SR on it, and then copy the VDI from the source SR to the > destination > >>> SR > >>> >> (the new SR). > >>> >> > >>> >> This leads to us having a copy of the VDI (a "snapshot" of sorts), > but > >>> it > >>> >> requires CPU cycles on the compute cluster as well as network > >>> bandwidth to > >>> >> write to the SAN (thus it is slower and more resource intensive > than a > >>> >> SolidFire snapshot). > >>> >> > >>> >> I spoke with Tim Mackey (who works on XenServer at Citrix) > concerning > >>> this > >>> >> issue before and during the CloudStack Collaboration Conference in > >>> Budapest > >>> >> in November. He agreed that this is a legitimate issue with the wa= y > >>> >> XenServer is designed and could not think of a way (other than wha= t > I > >>> was > >>> >> doing) to get around it in current versions of XenServer. > >>> >> > >>> >> One thought is to have a feature added to XenServer that enables > you to > >>> >> change the UUID of an SR and of a VDI. > >>> >> > >>> >> If I could do that, then I could take a SolidFire snapshot of the > >>> >> SolidFire volume and issue commands to XenServer to have it change > the > >>> >> UUIDs of the original SR and the original VDI. I could then recore= d > the > >>> >> necessary UUID info in the CS DB. > >>> >> > >>> >> *** End of e-mail *** > >>> >> > >>> >> I have since investigated this on ESXi. > >>> >> > >>> >> ESXi does have a way for us to "re-signature" a datastore, so > backend > >>> >> snapshots can be taken and effectively used on this hypervisor. > >>> >> > >>> >> On Mon, Feb 16, 2015 at 8:19 AM, Logan Barfield < > >>> lbarfield@tqhosting.com> > >>> >> wrote: > >>> >> > >>> >>> I'm just going to stick with the qemu-img option change for RBD f= or > >>> >>> now (which should cut snapshot time down drastically), and look > >>> >>> forward to this in the future. I'd be happy to help get this > moving, > >>> >>> but I'm not enough of a developer to lead the charge. > >>> >>> > >>> >>> As far as renaming goes, I agree that maybe backups isn't the rig= ht > >>> >>> word. That being said calling a full-sized copy of a volume a > >>> >>> "snapshot" also isn't the right word. Maybe "image" would be > better? > >>> >>> > >>> >>> I've also got my reservations about "accounts" vs "users" (I thin= k > >>> >>> "departments" and "accounts or users" respectively is less > confusing), > >>> >>> but that's a different thread. > >>> >>> > >>> >>> Thank You, > >>> >>> > >>> >>> Logan Barfield > >>> >>> Tranquil Hosting > >>> >>> > >>> >>> > >>> >>> On Mon, Feb 16, 2015 at 10:04 AM, Wido den Hollander < > wido@widodh.nl> > >>> >>> wrote: > >>> >>> > > >>> >>> > > >>> >>> > On 16-02-15 15:38, Logan Barfield wrote: > >>> >>> >> I like this idea a lot for Ceph RBD. I do think there should > >>> still be > >>> >>> >> support for copying snapshots to secondary storage as needed > (for > >>> >>> >> transfers between zones, etc.). I really think that this coul= d > be > >>> >>> >> part of a larger move to clarify the naming conventions used f= or > >>> disk > >>> >>> >> operations. Currently "Volume Snapshots" should probably > really be > >>> >>> >> called "Backups". So having "snapshot" functionality, and a > >>> "convert > >>> >>> >> snapshot to backup/template" would be a good move. > >>> >>> >> > >>> >>> > > >>> >>> > I fully agree that this would be a very great addition. > >>> >>> > > >>> >>> > I won't be able to work on this any time soon though. > >>> >>> > > >>> >>> > Wido > >>> >>> > > >>> >>> >> Thank You, > >>> >>> >> > >>> >>> >> Logan Barfield > >>> >>> >> Tranquil Hosting > >>> >>> >> > >>> >>> >> > >>> >>> >> On Mon, Feb 16, 2015 at 9:16 AM, Andrija Panic < > >>> >>> andrija.panic@gmail.com> wrote: > >>> >>> >>> BIG +1 > >>> >>> >>> > >>> >>> >>> My team should submit some patch to ACS for better KVM > snapshots, > >>> >>> including > >>> >>> >>> whole VM snapshot etc...but it's too early to give details... > >>> >>> >>> best > >>> >>> >>> > >>> >>> >>> On 16 February 2015 at 13:01, Andrei Mikhailovsky < > >>> andrei@arhont.com> > >>> >>> wrote: > >>> >>> >>> > >>> >>> >>>> Hello guys, > >>> >>> >>>> > >>> >>> >>>> I was hoping to have some feedback from the community on the > >>> subject > >>> >>> of > >>> >>> >>>> having an ability to keep snapshots on the primary storage > where > >>> it > >>> >>> is > >>> >>> >>>> supported by the storage backend. > >>> >>> >>>> > >>> >>> >>>> The idea behind this functionality is to improve how snapsho= ts > >>> are > >>> >>> >>>> currently handled on KVM hypervisors with Ceph primary > storage. > >>> At > >>> >>> the > >>> >>> >>>> moment, the snapshots are taken on the primary storage and > being > >>> >>> copied to > >>> >>> >>>> the secondary storage. This method is very slow and > inefficient > >>> even > >>> >>> on > >>> >>> >>>> small infrastructure. Even on medium deployments using > snapshots > >>> in > >>> >>> KVM > >>> >>> >>>> becomes nearly impossible. If you have tens or hundreds > >>> concurrent > >>> >>> >>>> snapshots taking place you will have a bunch of timeouts and > >>> errors, > >>> >>> your > >>> >>> >>>> network becomes clogged, etc. In addition, using these > snapshots > >>> for > >>> >>> >>>> creating new volumes or reverting back vms also slow and > >>> >>> inefficient. As > >>> >>> >>>> above, when you have tens or hundreds concurrent operations = it > >>> will > >>> >>> not > >>> >>> >>>> succeed and you will have a majority of tasks with errors or > >>> >>> timeouts. > >>> >>> >>>> > >>> >>> >>>> At the moment, taking a single snapshot of relatively small > >>> volumes > >>> >>> (200GB > >>> >>> >>>> or 500GB for instance) takes tens if not hundreds of minutes= . > >>> Taking > >>> >>> a > >>> >>> >>>> snapshot of the same volume on ceph primary storage takes a > few > >>> >>> seconds at > >>> >>> >>>> most! Similarly, converting a snapshot to a volume takes ten= s > if > >>> not > >>> >>> >>>> hundreds of minutes when secondary storage is involved; > compared > >>> with > >>> >>> >>>> seconds if done directly on the primary storage. > >>> >>> >>>> > >>> >>> >>>> I suggest that the CloudStack should have the ability to kee= p > >>> volume > >>> >>> >>>> snapshots on the primary storage where this is supported by > the > >>> >>> storage. > >>> >>> >>>> Perhaps having a per primary storage setting that enables th= is > >>> >>> >>>> functionality. This will be beneficial for Ceph primary > storage > >>> on > >>> >>> KVM > >>> >>> >>>> hypervisors and perhaps on XenServer when Ceph will be > supported > >>> in > >>> >>> a near > >>> >>> >>>> future. > >>> >>> >>>> > >>> >>> >>>> This will greatly speed up the process of using snapshots on > KVM > >>> and > >>> >>> users > >>> >>> >>>> will actually start using snapshotting rather than giving up > with > >>> >>> >>>> frustration. > >>> >>> >>>> > >>> >>> >>>> I have opened the ticket CLOUDSTACK-8256, so please cast you= r > >>> vote > >>> >>> if you > >>> >>> >>>> are in agreement. > >>> >>> >>>> > >>> >>> >>>> Thanks for your input > >>> >>> >>>> > >>> >>> >>>> Andrei > >>> >>> >>>> > >>> >>> >>>> > >>> >>> >>>> > >>> >>> >>>> > >>> >>> >>>> > >>> >>> >>> > >>> >>> >>> > >>> >>> >>> -- > >>> >>> >>> > >>> >>> >>> Andrija Pani=C4=87 > >>> >>> > >>> >> > >>> >> > >>> >> > >>> >> -- > >>> >> *Mike Tutkowski* > >>> >> *Senior CloudStack Developer, SolidFire Inc.* > >>> >> e: mike.tutkowski@solidfire.com > >>> >> o: 303.746.7302 > >>> >> Advancing the way the world uses the cloud > >>> >> *=E2=84=A2* > >>> >> > >>> > > >>> > > >>> > > >>> > -- > >>> > *Mike Tutkowski* > >>> > *Senior CloudStack Developer, SolidFire Inc.* > >>> > e: mike.tutkowski@solidfire.com > >>> > o: 303.746.7302 > >>> > Advancing the way the world uses the cloud > >>> > *=E2=84=A2* > >>> > >> > >> > >> > >> -- > >> *Mike Tutkowski* > >> *Senior CloudStack Developer, SolidFire Inc.* > >> e: mike.tutkowski@solidfire.com > >> o: 303.746.7302 > >> Advancing the way the world uses the cloud > >> *=E2=84=A2* > >> > > > > > > > > -- > > *Mike Tutkowski* > > *Senior CloudStack Developer, SolidFire Inc.* > > e: mike.tutkowski@solidfire.com > > o: 303.746.7302 > > Advancing the way the world uses the cloud > > *=E2=84=A2* > --=20 *Ian Rae* PDG *| *CEO t *514.944.4008* *CloudOps* Votre partenaire infonuagique* | *Cloud Solutions Experts w cloudops.com *|* 420 rue Guy *|* Montreal *|* Quebec *|* H3J 1S6 --001a113ddb80a24a4c050f3a93e3--