Return-Path: X-Original-To: apmail-incubator-deltacloud-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-deltacloud-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 509136CFD for ; Fri, 3 Jun 2011 13:45:26 +0000 (UTC) Received: (qmail 77766 invoked by uid 500); 3 Jun 2011 13:45:26 -0000 Delivered-To: apmail-incubator-deltacloud-dev-archive@incubator.apache.org Received: (qmail 77742 invoked by uid 500); 3 Jun 2011 13:45:26 -0000 Mailing-List: contact deltacloud-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: deltacloud-dev@incubator.apache.org Delivered-To: mailing list deltacloud-dev@incubator.apache.org Received: (qmail 77734 invoked by uid 99); 3 Jun 2011 13:45:26 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Jun 2011 13:45:26 +0000 X-ASF-Spam-Status: No, hits=-5.0 required=5.0 tests=RCVD_IN_DNSWL_HI,SPF_HELO_PASS,SPF_PASS,T_FRT_BELOW2 X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of clalance@redhat.com designates 209.132.183.28 as permitted sender) Received: from [209.132.183.28] (HELO mx1.redhat.com) (209.132.183.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Jun 2011 13:45:18 +0000 Received: from int-mx01.intmail.prod.int.phx2.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id p53DivEc026177 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Fri, 3 Jun 2011 09:44:57 -0400 Received: from localhost.localdomain (ovpn-113-84.phx2.redhat.com [10.3.113.84]) by int-mx01.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id p53Distp023465 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO); Fri, 3 Jun 2011 09:44:56 -0400 Date: Fri, 3 Jun 2011 09:46:31 -0400 From: Chris Lalancette To: deltacloud-dev@incubator.apache.org Subject: Re: Length of instance names in Deltacloud Message-ID: <20110603134631.GD7533@localhost.localdomain> Reply-To: Chris Lalancette References: <20110601161816.GC2548@localhost.localdomain> <4DE694F0.5060205@redhat.com> <20110601204416.GJ2548@localhost.localdomain> <4DE7EBF7.4080700@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4DE7EBF7.4080700@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-Scanned-By: MIMEDefang 2.67 on 10.5.11.11 X-Virus-Checked: Checked by ClamAV on apache.org On 06/02/11 - 10:00:55PM, Tomas Von Veschler wrote: > Thanks for the detailed explanation, really helps me understand > better the internals. Some comments bellow: > > On 06/01/2011 10:44 PM, Chris Lalancette wrote: > >On 06/01/11 - 09:37:20PM, Tomas Von Veschler wrote: > >>Hi Chris, > >> > >>One point that doesn't convince me too much is to use the condor id > >>as instance name. If for example using rhev, a sys admin will prefer > >>to have the VMs named like the user named it in Aelous, not really a > >>bunch of numbers/hash. > > > >Actually, I agree with you to a large extent. However, I have not been able > >to convince upstream condor (where I have to get condor patches accepted) about > >this. The reasoning is that the condor name is used as a unique > >handle to the instance. Consider: > > > >1) Aeolus submits the job to condor. > >2) Condor generates the name (something like Condor__uniquejobid). > >3) Condor writes the name to the internal database; note that it does *not* > >have an ID yet, because we haven't submitted the job to deltacloudd yet. > >4) Condor submits the job to deltacloudd. > >5) Before it can get the status back from deltacloudd, condor crashes. > >6) On restart, condor can find the job again by looking up the unique name > >that it generated. It can then start monitoring the job again. > > I understand the problem now. A question, after it retrieves the job > (=vm info?) in (6), does it switch to use VM IDs in advance or does > it keep searching by name? Yeah, it actually uses the ID once it knows it. The name is only used if the crash situation happens. > > >It's not possible to do 6) if you use the user-generated name, because it is > >not necessarily unique enough. > > But what about the triplet: provider_id # realm_id # vm_name ? For > rhev that'd be unique. I mean Condor would have a way to reach the > job again by decomposing the triplet. So what I've done here is to actually change condor to generate UUIDs. Those end up being 36 bytes long, which is short enough for RHEV-M and most other clouds. For the clouds where this is still too long, we will just truncate the UUID as appropriate. It results in some loss of uniqueness, but I think it should be OK for the most part. > > To note: it's possible to change the name of a VM at the virt layer. > That's why using vm ids sounds more robust than names to me. Yeah, this is a problem. If the user goes in after condor launches the job and changes the name, it could cause problems. But the name isn't the only thing that could cause a problem here; there are many things that a user could do to a VM "out-of-band" that would cause us to fail. For the moment we are defining it away, but it is something we will eventually have to deal with. > > Hope I'm not overcomplicating things, if we talk about managing > hundreds+ VMs, the name of VMs at the virt layer is much less > important than at an small rhev deployment (where btw probably won't > use cloud anyway, or Aeolus target is also smaller virt deployments? > I don't know :-) I'm honestly not sure. I don't quite know the benefit of managing your own cloud at a very small scale; it would seem you would just want to use the underlying virt platform. But that doesn't mean I'm right :). It's all a bit of a thorny problem, which is why it hasn't been solved yet. I'm not entirely satisfied with the solution I outlined above, but at the moment I don't see a better way around it. -- Chris Lalancette