directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashish <>
Subject Re: [Mitosis] random thoughts ...
Date Sun, 18 Jan 2009 03:39:09 GMT
Just an input, FYI I am not an LDAP expert :-(

Consider a situation where we have 2 LDAP Server (geographically
redundant) and both remain active. The WAN connection between them
breaks and comes up after a time of n. Now to keep both of them in
sync we use replication, so that both of them can server all my Users.

1. Will the replication take care of these needs (Servers may or may
not be time synchronized)

So is it better to maintain when the Server last sent its entries for
replication (using his local clock) and start from there.

- ashish

On Sun, Jan 18, 2009 at 5:45 AM, Emmanuel Lecharny <> wrote:
> After having discussed with Alex yesterday about replication, I thought a
> bit about what a replication system means, and I came to a point where we
> should not consider replication from a server to server perspective, but as
> a whole. Ok, know, it's a bit fuzzy. Let me explain what I have in mind.
> First, let's consider that all the servers are connected and replicate
> correctly, without any kind of problem (ie, they never get disconnected,
> they are all time-synchronized, all operations have their unique timestamp).
> In this genuine case, we should consider that the full set of LDAP server
> should be seen as a unique LDAP server : everything is just available from
> any server, without any difference.
> If at least one server get disconnected, the you have split this virtual big
> LDAP server in two parts : the disconnected server, and the rest of them. As
> they are still all connected, and perfectly synchronized, it's really like
> if we have one giant LDAP server again, so we are just facing two LDAP
> servers, disconnected.
> If we move a bit forward, if M servers get disconnected from a group of N
> servers, then we fall back in the same situation : M is seen as a unique
> LDAP server, so is N.
> One step further, if the set of servers is fragmented in many small
> disconnected sets, then each of those sets are seen a unique LDAP server.
> Ok, so far so good. Where did it brought us ? I think that replication per
> se is just a matter of managing replication between 2 servers, any other
> case can fell back to this category.
> Now, how do we manage replication between server A and server B (whatever
> the number of real servers present in A and B) ? Simple : as each operation
> within A or B are done on a globally connected system, with each operation
> having its unique timestamp (ie, two operations have two different
> timestamps), all the modifications done globally are ordered. It's just then
> a matter of re-ordering two lists of ordered operations on A and B, and to
> apply them from the oldest operation to the newest one. Let's see an example
> :
> Server A and server B were synchronized at t0, when the connection was
> broken. Since then, many modification operations occured on both servers :
> on Server A : op[1, t1], ..., op[i-1, ti-1], op[i, ti], op[i+1, ti+1], ...,
> op[n-1, tn-1], op[n, tn]
> on Server B : op[1, t'1], ..., op[j-1, t'j-1], op[j, t'j], op[j+1, t'j+1],
> ..., op[m-1, t'm-1], op[m, t'm]
> Server A and server B are now connected back to each other. each
> modifications done on B are to be applied on A and each operations done on A
> must be applied on B. What if some of those operations are conflicting ?
> Let's just come back at t0, when both servers were synchronized. If we
> consider that the servers remained synchronized all along the connection
> breakage, then A and B would have received the modifications from each other
> at the very moment they occurred, and each conflict would have result to an
> error being sent to the client.
> Let's do as if the connection never broke then :
> we restore the initial state of A and B to t0 (which is possible, as we have
> the ChangeLog system, allowing us to revert to a previous state). Of course,
> we do so on both servers. Now, let's merge the modifications form A and B :
> op[A, 1, t1], op[B, 1, t'1]..., op[B,j-1, tj-1], op[B, j, tj], op[A, i-1,
> ti-1],op[B, j+1, tj+1], op[A, i, ti], op[A, i+1, ti+1], ..., op[A, n-1,
> tn-1], op[B, m-1, t'm-1], op[B, m, t'm], op[A, n, tn]
> As the operation might have occurred at different times on both server, they
> have been mixed, but in any case, as each operation are supposed to have a
> unique timestamp, the resulting list of modification is still order, on both
> servers.
> Now, after having reverted to state t0, we just have to inject the
> modifications from the merged list on A and B, rejecting every modification
> which are errors. At the end, A and B will be perfectly synchronized,
> without conflicts.
> Now, remember that A and B are not unique servers, but set of servers. It
> doesn't matter too much, as we can consider that all the servers in set A
> and set B are totally replicated, so they are in the very same state, and
> the merged list can just be applied the same way to any server from A and B.
> What if we have many group of disconnected servers ? This is a bit more
> complex, but not so much. We just have to replicate the groups 2 by 2, or
> assume they are replicated 2 by 2, and at the end of a potentially long
> process, where we revert back to the time the server where disconnected and
> reapply all the merged modifications, we will be back in the same state for
> all the servers.
> There are only two conditions we must met :
> 1) the servers must be time synchronized,
> 2) the modifications timestamp must be unique, whatever server they have
> been done on.
> Condition 2 can easily be met with the existing CSN, if we consider that
> there is on order in the replicas (ie A < B < C, ... where A, B, C are the
> replica's id). This is purely conventional, but necessary.
> Regarding condition #1, we can't guarantee that all the servers will use the
> same time. We just do our best to get this as accurate as possible.
> Last, not least : the triggers. If some modification can triggers some other
> (because of integrity constraints being activated), then it should be logged
> in the change log. When replicating, the triggers _must_ be disabled, as the
> merged operations will contain all the triggered operations.
> Ok, I'm done now. All this is of course a coarse approximation, but I think
> it's pretty close to what we nned to deal with.
> Please just tell me if I'm not totally off rail, or if you think I have just
> did too much pot lately ;)



My Photo Galleries:

View raw message