On Sun, May 9, 2010 at 11:54 PM, Stefan Seelmann <seelmann@apache.org> wrote:
No objection at all.

I updated XDBM to use an <ID> type parameter to be flexible for
different ID types. The reason was that I wanted to use UUID for the
HBase partition. If we would use UUID in general for all partitions we
can remove that type parameter again.


Hmmm I guess you tried this out with the HBase partition already?  Was wondering how it worked since the increment for the long is used to update the on disk stored value.  I would have thought that the ID parameter extended Numeric or something.

Alex
 

Alex Karasulu wrote:
> Hi all,
>
> Any thoughts about using the globally visible UUID in the XDBM partition
> design for the primary key for Entries instead of using a partition
> specific  Long ID?
>
> I'm thinking we need one day to implement certain features. Let me list
> then and also point out why using the globally unique UUID might be
> advantageous:
>
> (1) System wide DN and Entry Cache
>
>       Rather than having each partition manage it's own cache a central
> DN and Entry cache makes sense. In this case a global identifier for an
> entry might come in handy for hashing cached values.
>
> (2) Nested Partitions, Default Root Partition, Hash Partitioning and
> Range Partitioning
>
>       At some point we will want to have nestable partitions. This means
> we can have one ADS Partition mounted under another ADS Partition with
> operation routing taking place properly to the nested partition where
> appropriate.
>
>       Nested partitions will also allow us to also have a default root
> partition from which we can mount other partitions.  The default root
> partition is nice to have since it allows us to add administrative areas
> and their administrative points with subentries onto the root empty
> string DN.  It also makes it so the RootDSE is now stored in this
> partition properly with persistence.  Right now the RootDSE is generated
> and not mutable.
>
>       Hash partitioning and range partitioning entails distributing
> entries across partitions under some container entry based on some
> value. Hash partitioning uses the value's hash to distribute entries
> where as range partitioning uses ranges of values to distribute the
> entries.  So it's not really the DN that determines which partition the
> entry is pushed into but this hash or range value. This makes it so we
> can scale to very large numbers of entries in the DIT while also
> distributing the disk access load across several disk spindles as does
> Oracle's RDBMS in these kinds of configurations.
>
> (3) Global Indices
>
>       If we use a globally unique UUID instead of a partition specific
> Long ID then we can expose index segments managed by partitions to
> higher layers to construct global indices.  These global indices can
> then be used to conduct searches outside of the partition one step
> higher.  This makes it possible for us to implement certain virtual
> directory strategies irregardless of the partition implementations used
> in a server's configuration.  The XDBM search algorithm can leverage
> these global indices or delegate sub partition search to a partition if
> a partition uses it's own search mechanism.  There's a lot to be said
> here but this is neither the time or the place to expand on this topic.
> But global indices is a key factor for several things including
> virtualization.
>
> Thoughts?
>
> --
> Alex Karasulu
> My Blog :: http://www.jroller.com/akarasulu/
> Apache Directory Server :: http://directory.apache.org
> Apache MINA :: http://mina.apache.org
> To set up a meeting with me: http://tungle.me/AlexKarasulu




--
Alex Karasulu
My Blog :: http://www.jroller.com/akarasulu/
Apache Directory Server :: http://directory.apache.org
Apache MINA :: http://mina.apache.org
To set up a meeting with me: http://tungle.me/AlexKarasulu