directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Alderson <eq...@planetquake.com>
Subject [ApacheDS][Mitosis] Replication data
Date Thu, 22 Nov 2007 23:19:15 GMT
Hi all,

I am currently looking into some of the replication issues, specifically 
DIRSERVER-894 ("Older concurrent changes are never replicated"), 
DIRSERVER-1097 ("Only send net changes during replication") and 
DIRSERVER-1101 ("New replicas may never receive some recent modifications").

I think these issues will require changing the replication data format. 
  Currently the replication logs are stored in a single database table 
with time, replica ID, sequence number and operation columns.  The first 
3 comprise the CSN and the last is for a serialised operation object.

DIRSERVER-894 needs a way to work out the CSN at the point a specific 
attribute was last modified.  DIRSERVER-1097 needs a way to find 
previous log entries based on entryUUID, modification type and attribute 
ID.  We are also planning on moving the replication data to the DIT. 
Given all this I am thinking of removing the serialised operation blob 
and replacing it with extra table(s) for each operation type storing the 
operation's data across multiple columns.  This will allow us to 
efficiently query the replication logs based on the operation data.

Perhaps this would be a good time to make the jump to storing the 
replication data in the DIT.  It seems that that would be well suited to 
storing the operations in an "exploded" format.  I am thinking of the 
following kind of format:

ou=logs/
   cn=<csn>/
       objectClass: ... (indicates operation type)
       time: ...
       replicaID: ...
       operationSequence: ...
       entryUUID: ...
       attributeID: <attributeName> (for attribute modifications)
       cn=attributes/
         <attributeName>: <attributeValues>

The biggest concern I have for this is the inflexibility of LDAP 
searches.  Do we have a sort control in ApacheDS?  Also, if we have the 
attributes for the operation in a child entry how can we find an 
operation in the logs based on those attributes.

At the same time I am thinking about a couple of things in the 
replication system that don't seem to be necessary.

Firstly, once DIRSERVER-894 is fixed, I don't think we will need the 
entryCSN attribute.  I believe that it is only used to check whether an 
operation should be applied to an entry or not (i.e. is it a new 
modification), but this is broken and we need to check the CSN per 
attribute by using the logs instead.

Secondly, I don't really see the point of "tombstoning" entries (marking 
them as deleted instead of really deleting them).  The only time I can 
see it having any kind of effect is when a replica receives a 
modification for an entry it thinks has been deleted - then it will 
resurrect it.  This seems like a very bad idea to me.  I would expect 
this to be a fatal replication error as something has gone seriously wrong.

Sorry for the long email... if anyone's managed to read this far any 
comments would be much appreciated.

Thanks,

Martin

Mime
View raw message