Mailing-List: contact dev-help@directory.apache.org; run by ezmlm
Precedence: bulk
Reply-To: "Apache Directory Developers List" <dev@directory.apache.org>
Received-SPF: pass (athena.apache.org: domain of akarasulu@gmail.com
 designates 209.85.161.50 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:sender:in-reply-to:references:date
         :x-google-sender-auth:message-id:subject:from:to:content-type;
        b=Yp5YXDPRhZPtBzJCF6LcQjK7B5gpNbzafR+64WTeMgJzXROn/TZd431iEwWEOIdWW+
         ZBuZiznQLl3MNsw1vv5PFj18uI3AUYnZ6fQjtqg58nkax/BaCnRbdNx5SuPWgQZBAGyo
         WhXbHpQbii4Ci/Zuv3oYIKd80/JTeP8MHPvfI=
MIME-Version: 1.0
Sender: akarasulu@gmail.com
In-Reply-To: <AANLkTikL8Deyi32FnJ2JbcHcT1MhqaEjpGKvxv2M_uAC@mail.gmail.com>
References: <AANLkTinyIgLJ_lflgnlgqFRTMGDt40z4uVUKD11kI_HN@mail.gmail.com>
	 <4BE51C52.7030900@gmail.com>
	 <AANLkTikL8Deyi32FnJ2JbcHcT1MhqaEjpGKvxv2M_uAC@mail.gmail.com>
Date: Sat, 8 May 2010 12:49:09 +0300
Message-ID: <AANLkTim5GsNogVx-CezUs9ZU6sZC-w_OZFbiFMGrHzi_@mail.gmail.com>
Subject: Re: [ApacheDS] [XDBM Partition] Using a global UUID instead of
	partition specific Long ID PK
From: Alex Karasulu <akarasulu@apache.org>
To: Apache Directory Developers List <dev@directory.apache.org>
Content-Type: multipart/alternative; boundary=001485f95ffa6221110486121627

--001485f95ffa6221110486121627
Content-Type: text/plain; charset=ISO-8859-1

On Sat, May 8, 2010 at 12:36 PM, Kiran Ayyagari <kayyagari@apache.org>wrote:

> On Sat, May 8, 2010 at 11:09 AM, Emmanuel Lecharny <elecharny@gmail.com>
> wrote:
> > On 5/8/10 9:43 AM, Alex Karasulu wrote:
> >>
> >> Hi all,
> >>
> >> Any thoughts about using the globally visible UUID in the XDBM partition
> >> design for the primary key for Entries instead of using a partition
> >> specific
> >>  Long ID?
> >>
> >> I'm thinking we need one day to implement certain features. Let me list
> >> then
> >> and also point out why using the globally unique UUID might be
> >> advantageous:
> >>
> >> (1) System wide DN and Entry Cache
> >>
> >>       Rather than having each partition manage it's own cache a central
> DN
> >> and Entry cache makes sense. In this case a global identifier for an
> entry
> >> might come in handy for hashing cached values.
> >>
> >> (2) Nested Partitions, Default Root Partition, Hash Partitioning and
> Range
> >> Partitioning
> >>
> >>       At some point we will want to have nestable partitions. This means
> >> we
> >> can have one ADS Partition mounted under another ADS Partition with
> >> operation routing taking place properly to the nested partition where
> >> appropriate.
> >>
> >>       Nested partitions will also allow us to also have a default root
> >> partition from which we can mount other partitions.  The default root
> >> partition is nice to have since it allows us to add administrative areas
> >> and
> >> their administrative points with subentries onto the root empty string
> DN.
> >>  It also makes it so the RootDSE is now stored in this partition
> properly
> >> with persistence.  Right now the RootDSE is generated and not mutable.
> >>
> >>       Hash partitioning and range partitioning entails distributing
> >> entries
> >> across partitions under some container entry based on some value. Hash
> >> partitioning uses the value's hash to distribute entries where as range
> >> partitioning uses ranges of values to distribute the entries.  So it's
> not
> >> really the DN that determines which partition the entry is pushed into
> but
> >> this hash or range value. This makes it so we can scale to very large
> >> numbers of entries in the DIT while also distributing the disk access
> load
> >> across several disk spindles as does Oracle's RDBMS in these kinds of
> >> configurations.
> >>
> >> (3) Global Indices
> >>
> >>       If we use a globally unique UUID instead of a partition specific
> >> Long
> >> ID then we can expose index segments managed by partitions to higher
> >> layers
> >> to construct global indices.  These global indices can then be used to
> >> conduct searches outside of the partition one step higher.  This makes
> it
> >> possible for us to implement certain virtual directory strategies
> >> irregardless of the partition implementations used in a server's
> >> configuration.  The XDBM search algorithm can leverage these global
> >> indices
> >> or delegate sub partition search to a partition if a partition uses it's
> >> own
> >> search mechanism.  There's a lot to be said here but this is neither the
> >> time or the place to expand on this topic. But global indices is a key
> >> factor for several things including virtualization.
> >>
> >> Thoughts?
> >>
> >
> > One other advantage will be that we won't need anymore to store an
> increment
> > on the disk. Atm, each time we add an element in the backend, we have to
> ask
> > for a Long, which has to be unique. This is potentially a bottleneck, and
> > it's costly, as this unique Long has to be stored on disk.
> besides this I see some more advantages
>
> *if* we keep the entryUUID of entry also as the ID of the entry then,
> building the DN using the RDN index will be
> a lot easier (cause finding the parent of an entry requires now a full
> DN construction which can be avoided
> by doing a reverse lookup in RDN idex if we know the entry's ID)
>
> >
> > I don't yet see any other negative impact we can get by using UUID
> instead
> > of Long, except that it will requires more disk space (slightly).
> yeap, and RDN index also takes more disk space now
>
>
Yeah but this disk space is very negligible. Basically the UUID is 16 bytes
and the Long is 8 on intel arch. We're talking about 8 extra bytes here. So
no need to even worry about it. The benefits will outweigh the disadvantages
if this is all we can see for disadvantages.


Regards,
-- 
Alex Karasulu
My Blog :: http://www.jroller.com/akarasulu/
Apache Directory Server :: http://directory.apache.org
Apache MINA :: http://mina.apache.org
To set up a meeting with me: http://tungle.me/AlexKarasulu

--001485f95ffa6221110486121627
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<br><br><div class=3D"gmail_quote">On Sat, May 8, 2010 at 12:36 PM, Kiran A=
yyagari <span dir=3D"ltr">&lt;<a href=3D"mailto:kayyagari@apache.org">kayya=
gari@apache.org</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" =
style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div class=3D"im">On Sat, May 8, 2010 at 11:09 AM, Emmanuel Lecharny &lt;<a=
 href=3D"mailto:elecharny@gmail.com">elecharny@gmail.com</a>&gt; wrote:<br>
</div><div><div></div><div class=3D"h5">&gt; On 5/8/10 9:43 AM, Alex Karasu=
lu wrote:<br>
&gt;&gt;<br>
&gt;&gt; Hi all,<br>
&gt;&gt;<br>
&gt;&gt; Any thoughts about using the globally visible UUID in the XDBM par=
tition<br>
&gt;&gt; design for the primary key for Entries instead of using a partitio=
n<br>
&gt;&gt; specific<br>
&gt;&gt; =A0Long ID?<br>
&gt;&gt;<br>
&gt;&gt; I&#39;m thinking we need one day to implement certain features. Le=
t me list<br>
&gt;&gt; then<br>
&gt;&gt; and also point out why using the globally unique UUID might be<br>
&gt;&gt; advantageous:<br>
&gt;&gt;<br>
&gt;&gt; (1) System wide DN and Entry Cache<br>
&gt;&gt;<br>
&gt;&gt; =A0 =A0 =A0 Rather than having each partition manage it&#39;s own =
cache a central DN<br>
&gt;&gt; and Entry cache makes sense. In this case a global identifier for =
an entry<br>
&gt;&gt; might come in handy for hashing cached values.<br>
&gt;&gt;<br>
&gt;&gt; (2) Nested Partitions, Default Root Partition, Hash Partitioning a=
nd Range<br>
&gt;&gt; Partitioning<br>
&gt;&gt;<br>
&gt;&gt; =A0 =A0 =A0 At some point we will want to have nestable partitions=
. This means<br>
&gt;&gt; we<br>
&gt;&gt; can have one ADS Partition mounted under another ADS Partition wit=
h<br>
&gt;&gt; operation routing taking place properly to the nested partition wh=
ere<br>
&gt;&gt; appropriate.<br>
&gt;&gt;<br>
&gt;&gt; =A0 =A0 =A0 Nested partitions will also allow us to also have a de=
fault root<br>
&gt;&gt; partition from which we can mount other partitions. =A0The default=
 root<br>
&gt;&gt; partition is nice to have since it allows us to add administrative=
 areas<br>
&gt;&gt; and<br>
&gt;&gt; their administrative points with subentries onto the root empty st=
ring DN.<br>
&gt;&gt; =A0It also makes it so the RootDSE is now stored in this partition=
 properly<br>
&gt;&gt; with persistence. =A0Right now the RootDSE is generated and not mu=
table.<br>
&gt;&gt;<br>
&gt;&gt; =A0 =A0 =A0 Hash partitioning and range partitioning entails distr=
ibuting<br>
&gt;&gt; entries<br>
&gt;&gt; across partitions under some container entry based on some value. =
Hash<br>
&gt;&gt; partitioning uses the value&#39;s hash to distribute entries where=
 as range<br>
&gt;&gt; partitioning uses ranges of values to distribute the entries. =A0S=
o it&#39;s not<br>
&gt;&gt; really the DN that determines which partition the entry is pushed =
into but<br>
&gt;&gt; this hash or range value. This makes it so we can scale to very la=
rge<br>
&gt;&gt; numbers of entries in the DIT while also distributing the disk acc=
ess load<br>
&gt;&gt; across several disk spindles as does Oracle&#39;s RDBMS in these k=
inds of<br>
&gt;&gt; configurations.<br>
&gt;&gt;<br>
&gt;&gt; (3) Global Indices<br>
&gt;&gt;<br>
&gt;&gt; =A0 =A0 =A0 If we use a globally unique UUID instead of a partitio=
n specific<br>
&gt;&gt; Long<br>
&gt;&gt; ID then we can expose index segments managed by partitions to high=
er<br>
&gt;&gt; layers<br>
&gt;&gt; to construct global indices. =A0These global indices can then be u=
sed to<br>
&gt;&gt; conduct searches outside of the partition one step higher. =A0This=
 makes it<br>
&gt;&gt; possible for us to implement certain virtual directory strategies<=
br>
&gt;&gt; irregardless of the partition implementations used in a server&#39=
;s<br>
&gt;&gt; configuration. =A0The XDBM search algorithm can leverage these glo=
bal<br>
&gt;&gt; indices<br>
&gt;&gt; or delegate sub partition search to a partition if a partition use=
s it&#39;s<br>
&gt;&gt; own<br>
&gt;&gt; search mechanism. =A0There&#39;s a lot to be said here but this is=
 neither the<br>
&gt;&gt; time or the place to expand on this topic. But global indices is a=
 key<br>
&gt;&gt; factor for several things including virtualization.<br>
&gt;&gt;<br>
&gt;&gt; Thoughts?<br>
&gt;&gt;<br>
&gt;<br>
&gt; One other advantage will be that we won&#39;t need anymore to store an=
 increment<br>
&gt; on the disk. Atm, each time we add an element in the backend, we have =
to ask<br>
&gt; for a Long, which has to be unique. This is potentially a bottleneck, =
and<br>
&gt; it&#39;s costly, as this unique Long has to be stored on disk.<br>
</div></div>besides this I see some more advantages<br>
<br>
*if* we keep the entryUUID of entry also as the ID of the entry then,<br>
building the DN using the RDN index will be<br>
a lot easier (cause finding the parent of an entry requires now a full<br>
DN construction which can be avoided<br>
by doing a reverse lookup in RDN idex if we know the entry&#39;s ID)<br>
<div class=3D"im"><br>
&gt;<br>
&gt; I don&#39;t yet see any other negative impact we can get by using UUID=
 instead<br>
&gt; of Long, except that it will requires more disk space (slightly).<br>
</div>yeap, and RDN index also takes more disk space now<br><br></blockquot=
e><div><br></div><div>Yeah but this disk space is very negligible. Basicall=
y the UUID is 16 bytes and the Long is 8 on intel arch. We&#39;re talking a=
bout 8 extra bytes here. So no need to even worry about it. The benefits wi=
ll outweigh the disadvantages if this is all we can see for disadvantages.=
=A0</div>
</div><br><br clear=3D"all">Regards,<br>-- <br>Alex Karasulu<br>My Blog :: =
<a href=3D"http://www.jroller.com/akarasulu/">http://www.jroller.com/akaras=
ulu/</a><br>Apache Directory Server :: <a href=3D"http://directory.apache.o=
rg">http://directory.apache.org</a><br>
Apache MINA :: <a href=3D"http://mina.apache.org">http://mina.apache.org</a=
><br>To set up a meeting with me: <a href=3D"http://tungle.me/AlexKarasulu"=
>http://tungle.me/AlexKarasulu</a><br>

--001485f95ffa6221110486121627--