directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kiran Ayyagari <kayyag...@apache.org>
Subject Re: [ApacheDS] [XDBM Partition] Using a global UUID instead of partition specific Long ID PK
Date Sat, 08 May 2010 09:36:19 GMT
On Sat, May 8, 2010 at 11:09 AM, Emmanuel Lecharny <elecharny@gmail.com> wrote:
> On 5/8/10 9:43 AM, Alex Karasulu wrote:
>>
>> Hi all,
>>
>> Any thoughts about using the globally visible UUID in the XDBM partition
>> design for the primary key for Entries instead of using a partition
>> specific
>>  Long ID?
>>
>> I'm thinking we need one day to implement certain features. Let me list
>> then
>> and also point out why using the globally unique UUID might be
>> advantageous:
>>
>> (1) System wide DN and Entry Cache
>>
>>       Rather than having each partition manage it's own cache a central DN
>> and Entry cache makes sense. In this case a global identifier for an entry
>> might come in handy for hashing cached values.
>>
>> (2) Nested Partitions, Default Root Partition, Hash Partitioning and Range
>> Partitioning
>>
>>       At some point we will want to have nestable partitions. This means
>> we
>> can have one ADS Partition mounted under another ADS Partition with
>> operation routing taking place properly to the nested partition where
>> appropriate.
>>
>>       Nested partitions will also allow us to also have a default root
>> partition from which we can mount other partitions.  The default root
>> partition is nice to have since it allows us to add administrative areas
>> and
>> their administrative points with subentries onto the root empty string DN.
>>  It also makes it so the RootDSE is now stored in this partition properly
>> with persistence.  Right now the RootDSE is generated and not mutable.
>>
>>       Hash partitioning and range partitioning entails distributing
>> entries
>> across partitions under some container entry based on some value. Hash
>> partitioning uses the value's hash to distribute entries where as range
>> partitioning uses ranges of values to distribute the entries.  So it's not
>> really the DN that determines which partition the entry is pushed into but
>> this hash or range value. This makes it so we can scale to very large
>> numbers of entries in the DIT while also distributing the disk access load
>> across several disk spindles as does Oracle's RDBMS in these kinds of
>> configurations.
>>
>> (3) Global Indices
>>
>>       If we use a globally unique UUID instead of a partition specific
>> Long
>> ID then we can expose index segments managed by partitions to higher
>> layers
>> to construct global indices.  These global indices can then be used to
>> conduct searches outside of the partition one step higher.  This makes it
>> possible for us to implement certain virtual directory strategies
>> irregardless of the partition implementations used in a server's
>> configuration.  The XDBM search algorithm can leverage these global
>> indices
>> or delegate sub partition search to a partition if a partition uses it's
>> own
>> search mechanism.  There's a lot to be said here but this is neither the
>> time or the place to expand on this topic. But global indices is a key
>> factor for several things including virtualization.
>>
>> Thoughts?
>>
>
> One other advantage will be that we won't need anymore to store an increment
> on the disk. Atm, each time we add an element in the backend, we have to ask
> for a Long, which has to be unique. This is potentially a bottleneck, and
> it's costly, as this unique Long has to be stored on disk.
besides this I see some more advantages

*if* we keep the entryUUID of entry also as the ID of the entry then,
building the DN using the RDN index will be
a lot easier (cause finding the parent of an entry requires now a full
DN construction which can be avoided
by doing a reverse lookup in RDN idex if we know the entry's ID)

>
> I don't yet see any other negative impact we can get by using UUID instead
> of Long, except that it will requires more disk space (slightly).
yeap, and RDN index also takes more disk space now

Kiran Ayyagari

Mime
View raw message