directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Karasulu <akaras...@apache.org>
Subject Re: [ApacheDS] [XDBM Partition] Using a global UUID instead of partition specific Long ID PK
Date Sat, 08 May 2010 09:49:09 GMT
On Sat, May 8, 2010 at 12:36 PM, Kiran Ayyagari <kayyagari@apache.org>wrote:

> On Sat, May 8, 2010 at 11:09 AM, Emmanuel Lecharny <elecharny@gmail.com>
> wrote:
> > On 5/8/10 9:43 AM, Alex Karasulu wrote:
> >>
> >> Hi all,
> >>
> >> Any thoughts about using the globally visible UUID in the XDBM partition
> >> design for the primary key for Entries instead of using a partition
> >> specific
> >>  Long ID?
> >>
> >> I'm thinking we need one day to implement certain features. Let me list
> >> then
> >> and also point out why using the globally unique UUID might be
> >> advantageous:
> >>
> >> (1) System wide DN and Entry Cache
> >>
> >>       Rather than having each partition manage it's own cache a central
> DN
> >> and Entry cache makes sense. In this case a global identifier for an
> entry
> >> might come in handy for hashing cached values.
> >>
> >> (2) Nested Partitions, Default Root Partition, Hash Partitioning and
> Range
> >> Partitioning
> >>
> >>       At some point we will want to have nestable partitions. This means
> >> we
> >> can have one ADS Partition mounted under another ADS Partition with
> >> operation routing taking place properly to the nested partition where
> >> appropriate.
> >>
> >>       Nested partitions will also allow us to also have a default root
> >> partition from which we can mount other partitions.  The default root
> >> partition is nice to have since it allows us to add administrative areas
> >> and
> >> their administrative points with subentries onto the root empty string
> DN.
> >>  It also makes it so the RootDSE is now stored in this partition
> properly
> >> with persistence.  Right now the RootDSE is generated and not mutable.
> >>
> >>       Hash partitioning and range partitioning entails distributing
> >> entries
> >> across partitions under some container entry based on some value. Hash
> >> partitioning uses the value's hash to distribute entries where as range
> >> partitioning uses ranges of values to distribute the entries.  So it's
> not
> >> really the DN that determines which partition the entry is pushed into
> but
> >> this hash or range value. This makes it so we can scale to very large
> >> numbers of entries in the DIT while also distributing the disk access
> load
> >> across several disk spindles as does Oracle's RDBMS in these kinds of
> >> configurations.
> >>
> >> (3) Global Indices
> >>
> >>       If we use a globally unique UUID instead of a partition specific
> >> Long
> >> ID then we can expose index segments managed by partitions to higher
> >> layers
> >> to construct global indices.  These global indices can then be used to
> >> conduct searches outside of the partition one step higher.  This makes
> it
> >> possible for us to implement certain virtual directory strategies
> >> irregardless of the partition implementations used in a server's
> >> configuration.  The XDBM search algorithm can leverage these global
> >> indices
> >> or delegate sub partition search to a partition if a partition uses it's
> >> own
> >> search mechanism.  There's a lot to be said here but this is neither the
> >> time or the place to expand on this topic. But global indices is a key
> >> factor for several things including virtualization.
> >>
> >> Thoughts?
> >>
> >
> > One other advantage will be that we won't need anymore to store an
> increment
> > on the disk. Atm, each time we add an element in the backend, we have to
> ask
> > for a Long, which has to be unique. This is potentially a bottleneck, and
> > it's costly, as this unique Long has to be stored on disk.
> besides this I see some more advantages
>
> *if* we keep the entryUUID of entry also as the ID of the entry then,
> building the DN using the RDN index will be
> a lot easier (cause finding the parent of an entry requires now a full
> DN construction which can be avoided
> by doing a reverse lookup in RDN idex if we know the entry's ID)
>
> >
> > I don't yet see any other negative impact we can get by using UUID
> instead
> > of Long, except that it will requires more disk space (slightly).
> yeap, and RDN index also takes more disk space now
>
>
Yeah but this disk space is very negligible. Basically the UUID is 16 bytes
and the Long is 8 on intel arch. We're talking about 8 extra bytes here. So
no need to even worry about it. The benefits will outweigh the disadvantages
if this is all we can see for disadvantages.


Regards,
-- 
Alex Karasulu
My Blog :: http://www.jroller.com/akarasulu/
Apache Directory Server :: http://directory.apache.org
Apache MINA :: http://mina.apache.org
To set up a meeting with me: http://tungle.me/AlexKarasulu

Mime
View raw message