Return-Path: Delivered-To: apmail-directory-dev-archive@www.apache.org Received: (qmail 68016 invoked from network); 8 May 2010 09:49:37 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 8 May 2010 09:49:37 -0000 Received: (qmail 98246 invoked by uid 500); 8 May 2010 09:49:37 -0000 Delivered-To: apmail-directory-dev-archive@directory.apache.org Received: (qmail 98049 invoked by uid 500); 8 May 2010 09:49:37 -0000 Mailing-List: contact dev-help@directory.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Apache Directory Developers List" Delivered-To: mailing list dev@directory.apache.org Received: (qmail 98042 invoked by uid 99); 8 May 2010 09:49:36 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 08 May 2010 09:49:36 +0000 X-ASF-Spam-Status: No, hits=0.9 required=10.0 tests=AWL,FREEMAIL_FROM,HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of akarasulu@gmail.com designates 209.85.161.50 as permitted sender) Received: from [209.85.161.50] (HELO mail-fx0-f50.google.com) (209.85.161.50) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 08 May 2010 09:49:31 +0000 Received: by fxm20 with SMTP id 20so1470157fxm.37 for ; Sat, 08 May 2010 02:49:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:content-type; bh=Oai7GScOUzGRhQaNmdonVoOTaKG5Oola4pNN1b4pRHM=; b=RD8OZ03yWb2prCt5PwsI9Zt+/RJF9sS0HSdinYwekVWMvgChFoegzRoznI8LYyBy+I r2if5OUrsdv15YQb9DHoTCIu0APEmwD7mM7bTUIvaDa+tdVNWlAXnfc+0pkFsWbyExRu I+uS04aOWNHejkAQeEss7f493bkwUAvBM9FA0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type; b=Yp5YXDPRhZPtBzJCF6LcQjK7B5gpNbzafR+64WTeMgJzXROn/TZd431iEwWEOIdWW+ ZBuZiznQLl3MNsw1vv5PFj18uI3AUYnZ6fQjtqg58nkax/BaCnRbdNx5SuPWgQZBAGyo WhXbHpQbii4Ci/Zuv3oYIKd80/JTeP8MHPvfI= MIME-Version: 1.0 Received: by 10.239.142.10 with SMTP id e10mr126948hba.113.1273312149905; Sat, 08 May 2010 02:49:09 -0700 (PDT) Sender: akarasulu@gmail.com Received: by 10.239.189.77 with HTTP; Sat, 8 May 2010 02:49:09 -0700 (PDT) In-Reply-To: References: <4BE51C52.7030900@gmail.com> Date: Sat, 8 May 2010 12:49:09 +0300 X-Google-Sender-Auth: 2PxxASPk1K-i1uqsyFZWmrIz6p4 Message-ID: Subject: Re: [ApacheDS] [XDBM Partition] Using a global UUID instead of partition specific Long ID PK From: Alex Karasulu To: Apache Directory Developers List Content-Type: multipart/alternative; boundary=001485f95ffa6221110486121627 --001485f95ffa6221110486121627 Content-Type: text/plain; charset=ISO-8859-1 On Sat, May 8, 2010 at 12:36 PM, Kiran Ayyagari wrote: > On Sat, May 8, 2010 at 11:09 AM, Emmanuel Lecharny > wrote: > > On 5/8/10 9:43 AM, Alex Karasulu wrote: > >> > >> Hi all, > >> > >> Any thoughts about using the globally visible UUID in the XDBM partition > >> design for the primary key for Entries instead of using a partition > >> specific > >> Long ID? > >> > >> I'm thinking we need one day to implement certain features. Let me list > >> then > >> and also point out why using the globally unique UUID might be > >> advantageous: > >> > >> (1) System wide DN and Entry Cache > >> > >> Rather than having each partition manage it's own cache a central > DN > >> and Entry cache makes sense. In this case a global identifier for an > entry > >> might come in handy for hashing cached values. > >> > >> (2) Nested Partitions, Default Root Partition, Hash Partitioning and > Range > >> Partitioning > >> > >> At some point we will want to have nestable partitions. This means > >> we > >> can have one ADS Partition mounted under another ADS Partition with > >> operation routing taking place properly to the nested partition where > >> appropriate. > >> > >> Nested partitions will also allow us to also have a default root > >> partition from which we can mount other partitions. The default root > >> partition is nice to have since it allows us to add administrative areas > >> and > >> their administrative points with subentries onto the root empty string > DN. > >> It also makes it so the RootDSE is now stored in this partition > properly > >> with persistence. Right now the RootDSE is generated and not mutable. > >> > >> Hash partitioning and range partitioning entails distributing > >> entries > >> across partitions under some container entry based on some value. Hash > >> partitioning uses the value's hash to distribute entries where as range > >> partitioning uses ranges of values to distribute the entries. So it's > not > >> really the DN that determines which partition the entry is pushed into > but > >> this hash or range value. This makes it so we can scale to very large > >> numbers of entries in the DIT while also distributing the disk access > load > >> across several disk spindles as does Oracle's RDBMS in these kinds of > >> configurations. > >> > >> (3) Global Indices > >> > >> If we use a globally unique UUID instead of a partition specific > >> Long > >> ID then we can expose index segments managed by partitions to higher > >> layers > >> to construct global indices. These global indices can then be used to > >> conduct searches outside of the partition one step higher. This makes > it > >> possible for us to implement certain virtual directory strategies > >> irregardless of the partition implementations used in a server's > >> configuration. The XDBM search algorithm can leverage these global > >> indices > >> or delegate sub partition search to a partition if a partition uses it's > >> own > >> search mechanism. There's a lot to be said here but this is neither the > >> time or the place to expand on this topic. But global indices is a key > >> factor for several things including virtualization. > >> > >> Thoughts? > >> > > > > One other advantage will be that we won't need anymore to store an > increment > > on the disk. Atm, each time we add an element in the backend, we have to > ask > > for a Long, which has to be unique. This is potentially a bottleneck, and > > it's costly, as this unique Long has to be stored on disk. > besides this I see some more advantages > > *if* we keep the entryUUID of entry also as the ID of the entry then, > building the DN using the RDN index will be > a lot easier (cause finding the parent of an entry requires now a full > DN construction which can be avoided > by doing a reverse lookup in RDN idex if we know the entry's ID) > > > > > I don't yet see any other negative impact we can get by using UUID > instead > > of Long, except that it will requires more disk space (slightly). > yeap, and RDN index also takes more disk space now > > Yeah but this disk space is very negligible. Basically the UUID is 16 bytes and the Long is 8 on intel arch. We're talking about 8 extra bytes here. So no need to even worry about it. The benefits will outweigh the disadvantages if this is all we can see for disadvantages. Regards, -- Alex Karasulu My Blog :: http://www.jroller.com/akarasulu/ Apache Directory Server :: http://directory.apache.org Apache MINA :: http://mina.apache.org To set up a meeting with me: http://tungle.me/AlexKarasulu --001485f95ffa6221110486121627 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

On Sat, May 8, 2010 at 12:36 PM, Kiran A= yyagari <kayya= gari@apache.org> wrote:
On Sat, May 8, 2010 at 11:09 AM, Emmanuel Lecharny <elecharny@gmail.com> wrote:
> On 5/8/10 9:43 AM, Alex Karasu= lu wrote:
>>
>> Hi all,
>>
>> Any thoughts about using the globally visible UUID in the XDBM par= tition
>> design for the primary key for Entries instead of using a partitio= n
>> specific
>> =A0Long ID?
>>
>> I'm thinking we need one day to implement certain features. Le= t me list
>> then
>> and also point out why using the globally unique UUID might be
>> advantageous:
>>
>> (1) System wide DN and Entry Cache
>>
>> =A0 =A0 =A0 Rather than having each partition manage it's own = cache a central DN
>> and Entry cache makes sense. In this case a global identifier for = an entry
>> might come in handy for hashing cached values.
>>
>> (2) Nested Partitions, Default Root Partition, Hash Partitioning a= nd Range
>> Partitioning
>>
>> =A0 =A0 =A0 At some point we will want to have nestable partitions= . This means
>> we
>> can have one ADS Partition mounted under another ADS Partition wit= h
>> operation routing taking place properly to the nested partition wh= ere
>> appropriate.
>>
>> =A0 =A0 =A0 Nested partitions will also allow us to also have a de= fault root
>> partition from which we can mount other partitions. =A0The default= root
>> partition is nice to have since it allows us to add administrative= areas
>> and
>> their administrative points with subentries onto the root empty st= ring DN.
>> =A0It also makes it so the RootDSE is now stored in this partition= properly
>> with persistence. =A0Right now the RootDSE is generated and not mu= table.
>>
>> =A0 =A0 =A0 Hash partitioning and range partitioning entails distr= ibuting
>> entries
>> across partitions under some container entry based on some value. = Hash
>> partitioning uses the value's hash to distribute entries where= as range
>> partitioning uses ranges of values to distribute the entries. =A0S= o it's not
>> really the DN that determines which partition the entry is pushed = into but
>> this hash or range value. This makes it so we can scale to very la= rge
>> numbers of entries in the DIT while also distributing the disk acc= ess load
>> across several disk spindles as does Oracle's RDBMS in these k= inds of
>> configurations.
>>
>> (3) Global Indices
>>
>> =A0 =A0 =A0 If we use a globally unique UUID instead of a partitio= n specific
>> Long
>> ID then we can expose index segments managed by partitions to high= er
>> layers
>> to construct global indices. =A0These global indices can then be u= sed to
>> conduct searches outside of the partition one step higher. =A0This= makes it
>> possible for us to implement certain virtual directory strategies<= br> >> irregardless of the partition implementations used in a server'= ;s
>> configuration. =A0The XDBM search algorithm can leverage these glo= bal
>> indices
>> or delegate sub partition search to a partition if a partition use= s it's
>> own
>> search mechanism. =A0There's a lot to be said here but this is= neither the
>> time or the place to expand on this topic. But global indices is a= key
>> factor for several things including virtualization.
>>
>> Thoughts?
>>
>
> One other advantage will be that we won't need anymore to store an= increment
> on the disk. Atm, each time we add an element in the backend, we have = to ask
> for a Long, which has to be unique. This is potentially a bottleneck, = and
> it's costly, as this unique Long has to be stored on disk.
besides this I see some more advantages

*if* we keep the entryUUID of entry also as the ID of the entry then,
building the DN using the RDN index will be
a lot easier (cause finding the parent of an entry requires now a full
DN construction which can be avoided
by doing a reverse lookup in RDN idex if we know the entry's ID)

>
> I don't yet see any other negative impact we can get by using UUID= instead
> of Long, except that it will requires more disk space (slightly).
yeap, and RDN index also takes more disk space now


Yeah but this disk space is very negligible. Basicall= y the UUID is 16 bytes and the Long is 8 on intel arch. We're talking a= bout 8 extra bytes here. So no need to even worry about it. The benefits wi= ll outweigh the disadvantages if this is all we can see for disadvantages.= =A0


Regards,
--
Alex Karasulu
My Blog :: = http://www.jroller.com/akaras= ulu/
Apache Directory Server :: http://directory.apache.org
Apache MINA :: http://mina.apache.org
To set up a meeting with me:
http://tungle.me/AlexKarasulu
--001485f95ffa6221110486121627--