directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Karasulu" <akaras...@apache.org>
Subject Re: [ApacheDS][Mitosis] Replication data
Date Sat, 01 Dec 2007 20:49:48 GMT
Hi Martin,

I must first apologize for taking so long to respond to you.  I just wanted
to clear enough time to look into these issues and think properly about
them.

On Nov 22, 2007 6:19 PM, Martin Alderson <equim@planetquake.com> wrote:

> Hi all,
>
> I am currently looking into some of the replication issues, specifically
> DIRSERVER-894 ("Older concurrent changes are never replicated"),
> DIRSERVER-1097 ("Only send net changes during replication") and
> DIRSERVER-1101 ("New replicas may never receive some recent
> modifications").
>
> I think these issues will require changing the replication data format.
>  Currently the replication logs are stored in a single database table
> with time, replica ID, sequence number and operation columns.  The first
> 3 comprise the CSN and the last is for a serialised operation object.
>
> DIRSERVER-894 needs a way to work out the CSN at the point a specific
> attribute was last modified.  DIRSERVER-1097 needs a way to find
> previous log entries based on entryUUID, modification type and attribute
> ID.  We are also planning on moving the replication data to the DIT.
> Given all this I am thinking of removing the serialised operation blob
> and replacing it with extra table(s) for each operation type storing the
> operation's data across multiple columns.  This will allow us to
> efficiently query the replication logs based on the operation data.
>
> Perhaps this would be a good time to make the jump to storing the
> replication data in the DIT.


Before we think about storing it in the DIT let's think about the most
optimal kind of storage and format possible for this data.  I have a feeling
it might be better to write a custom high capacity store implementation that
answers very specific questions and is highly optimized for them.  I'm
thinking it might be worth writing a custom BTree based store not so much a
partition which is a bit more general.

This store can hard code and optimize the way these questions are
specifically answered using indices. Furthermore we have all the
dependencies we need for this.  Also we can build a partition wrapper around
the store interface mapping that over to the LDAP Namespace.

I know the idea of having yet another non-optional user partition for this
is a bit overkill since we already have the system and schema partitions.
However the issues with this will change when we allow for nestable
partitions and do away with this nexus concept with this issue:

    https://issues.apache.org/jira/browse/DIRSERVER-465

I think this feature will be implemented at some point before 2.0 but until
then we can work on just making a special Partition implementation which
wraps the ReplicationStore interface.  This way any store implementation can
be exposed by the DIT simply by adding the partition to some context in the
DIT.

It seems that that would be well suited to
> storing the operations in an "exploded" format.  I am thinking of the
> following kind of format:
>
> ou=logs/
>   cn=<csn>/
>       objectClass: ... (indicates operation type)
>       time: ...
>       replicaID: ...
>       operationSequence: ...
>       entryUUID: ...
>       attributeID: <attributeName> (for attribute modifications)
>       cn=attributes/
>         <attributeName>: <attributeValues>
>
> The biggest concern I have for this is the inflexibility of LDAP
> searches.  Do we have a sort control in ApacheDS?


Yeah don't worry about that I can make sure we can search efficiently.  I
can talk to you more about your specific search needs.  Just model a solid
JDBC schema that you need to do replication right - we can easily deduce an
LDAP schema for this with the appropriate changes to make the questions you
need answered fast and easy.  But keep in mind you can ask questions through
the store interface as well and don't need to resort to an LDAP interface
but you can do both.

Also, if we have the
> attributes for the operation in a child entry how can we find an
> operation in the logs based on those attributes.
>

I think we can use LDAP backlinks for such associations which is another
feature we're working on.  However we might just want to build the right
store implementation using JDBM and just for now keep in mind how we can
wrap the store interface as a partition to expose this through LDAP.


>
> At the same time I am thinking about a couple of things in the
> replication system that don't seem to be necessary.
>
> Firstly, once DIRSERVER-894 is fixed, I don't think we will need the
> entryCSN attribute.  I believe that it is only used to check whether an
> operation should be applied to an entry or not (i.e. is it a new
> modification), but this is broken and we need to check the CSN per
> attribute by using the logs instead.
>

Right no problem if you want to axe it we can do that.  Oh this reminds me
that we also need to make sure we're generating UUIDs all the time even if
replication is not enabled.  We want to have the entryUUID as an operational
attribute of all entries so when replication is turned things work.  We can
also use the UUID for many other things.

>
> Secondly, I don't really see the point of "tombstoning" entries (marking
> them as deleted instead of really deleting them).  The only time I can
> see it having any kind of effect is when a replica receives a
> modification for an entry it thinks has been deleted - then it will
> resurrect it.  This seems like a very bad idea to me.  I would expect
> this to be a fatal replication error as something has gone seriously
> wrong.
>

I've got to admit that I'm not well versed enough on this topic to answer
you on this but I do know that it is a valid techique that is widely
practiced in replication theory.  For example it's used in Active
Directory.  So I would recommend researching this topic a little bit but I'm
open to anything as long as we are educated about it.


> Sorry for the long email... if anyone's managed to read this far any
> comments would be much appreciated.
>

Hey it took me a while sorry for that.  This is a very important topic that
we need to get right.  I also have a couple of other points or topics I want
to touch on.

(1) I think it would be really nice to be able to replicate with OpenLDAP
and also learn about the sync replication mechanism used.  Perhaps they have
some nice techniques which we have not thought of yet.

(2) I know OpenLDAP leverages a changelog similar but not exactly the same
as our changelog. Perhaps we need to explore this relationship and figure
out how to better leverage this changelog.  I think the CSN is synonymous
with a revision except revisions are local and CSN's are global.

Alex

Mime
View raw message