directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Howard Chu <...@symas.com>
Subject Re: [ApacheDS][Mitosis] Replication data
Date Sun, 02 Dec 2007 00:45:51 GMT
Alex Karasulu wrote:
> No. You're absolutely right. Things like that are a hacks. The algorithm 
> needs to make sure these situations are avoided.
> 
> Again I don't know what the best option here is and have seen 
> tombstoning referenced all over papers on replication and having more 
> information while making this decision is the best option. Your thoughts 
> on the matter however do help further my doubts on tombstoning. I want 
> to think about the alternative options for handling specific conflict 
> resolution problems.

> For example you have a delete of a node occur right when you add a child 
> to it. The server would probably put the child into some lost and found 
> area and alert the administrator. With tombstoning you can easily 
> resuscitate the deleted parent and move the child back under it. But 
> then again with a full log you can simply recreate the state of the 
> deleted parent anyway so yes alternatives do exist.

> Tombstoning, incidentally, is a freaking nightmare in the architecture 
> that is to this day causing us much grief and bugs we have yet to 
> realize. So if you have better alternatives I'm all open to it. I just 
> wish I had the 2-years to really read all those ACM and IEEE papers on 
> the topic and attack this better.

I had considered tombstones at one point for syncrepl as well. The motivation 
being, in a syncrepl Refresh we need to see if a consumer's last known CSN is 
still present in the provider's DB. If the last CSN was assigned to a Delete 
operation, then this lookup would fail because there is no entry in the DB 
with that corresponding entryCSN. In this case, the log fulfills the same 
purpose as a tombstone. (Without the log, we simply look for any CSN <= the 
consumer's CSN. If anything older is present in the DB, then we know the 
consumer is not out of date. Otherwise the consumer's state is too old and we 
perform a full resync.)

For the example you cite above, we just create glue entries for missing parent 
entries. We need that capability anyway, and tombstoning doesn't provide the 
equivalent solution, because we support partial and fractional replication. 
(Active Directory only supports full replication.) E.g., if you define your 
replication criteria with a particular search filter, entries can appear and 
disappear from the replication context arbitrarily, without any add/delete 
operations being involved. If you only have tombstone support, you can't 
handle these cases at all.

>     It's fair to say that we've faced the same issues already ;) Also
>     our MMR
>     support is still immature, we don't yet do value-level conflict
>     resolution.
>     But the plan for that is pretty straightforward.
> 
> 
> Yeash we have yet to consider that. If you have a clear idea of how it 
> can be done cheap that's great. Can you point us to some documentation 
> on how you've implemented replication?

Well, RFC4533 gives the syncrepl fundamentals. If you just read RFC4533 you 
can get basic single-master replication working. There's no written 
documentation on how we've implemented MMR on top of it (aside from, perhaps, 
discussions on openldap-devel). In some ways we violate the spirit of the 
spec: a consumer is not supposed to be aware of the contents of the syncrepl 
cookie or ever look inside. In OpenLDAP, we expect the cookie to contain CSNs, 
and we parse them with that expectation. (That's a design decision made before 
I got involved. Don't see a way around it though.)

The trick to get from basic single-master to basic (entry-level only) 
multi-master is just to store multiple contextCSNs - one for each peer master, 
and ignore entry updates that are older than an entry's current entryCSN. The 
other requirement here is that you have reliable, tightly synced clocks, 
otherwise the conflict resolution policy falls apart.

>  > (2) I know OpenLDAP leverages a changelog similar but not exactly the
>  > same as our changelog. Perhaps we need to explore this relationship and
>  > figure out how to better leverage this changelog. I think the CSN is
>  > synonymous with a revision except revisions are local and CSN's are 
> global.
> 
>     Normal syncrepl doesn't rely on any logs; it simply uses entryCSNs. It
>     replicates whole entries (and therefore MMR only provides
>     entry-level conflict
>     resolution). 

> Yeah that's a big problem when several scenarios for attribute and value 
> level conflicts arise. You must be adding to this in your implementation 
> to compensate.

For plain syncrepl, we're not going to enhance this any further. It's entry 
based, and last writer wins. It may not always be "right" from someone's point 
of view but it's trivial to guarantee consistency.

>     It can use a session log to optimize the replication of delete
>     operations, but doesn't actually need that.
> 
>     Delta-syncrepl uses the log schema (which I pointed you at already) to
>     replicate only individual changes. 

> Sorry I have no idea what delta-syncrepl is. Is it an RFC I've missed? 
> Can you give us some references?

There's no RFC specifically for it; it's a natural progression from combining 
the log schema with regular syncrepl. (I like lego-block style design...) 
Since syncrepl only sends whole entries, I use a log whose entries record the 
deltas on some other DB. The syncrepl consumer is configured with two search 
bases, the main DB and the log DB. It always starts up by sending a sync 
search against the log DB; if it's not out of date then it will receive the 
log's entries and parse them into appropriate modify operations. If it *is* 
out of date, the provider will return a "ReloadRequired" result and the 
consumer will try again against the main DB, thus it automatically and 
transparently falls back to normal operation until it catches up again.

So the protocol is pretty much identical to RFC4533, the only difference is 
the content of the entries being transferred.

>     This is the mechanism we'll be extending to
>     provide value-level conflict resolution for MMR. The basic approach
>     is that
>     with every delta received, we also send the entry's old entryCSN. If
>     that
>     doesn't match the entryCSN on the replica, then some other write has
>     occurred
>     and there is a potential conflict. At that point we can search backward
>     through the changelog for that entryUUID or entryCSN and find the
>     point of
>     divergence.

> That sounds like a sensible approach. Searching the changelog is the 
> key. I'd love to get the big picture here and try to make sure we can 
> replicate between ApacheDS and OpenLDAP. This would be very beneficial 
> to both user bases.

It sounds to me like you already have all of the schema elements in place to 
get RFC4533 implemented. We can work out the delta MMR stuff as a joint 
project, always good to have someone else to check our assumptions.
-- 
   -- Howard Chu
   Chief Architect, Symas Corp.  http://www.symas.com
   Director, Highland Sun        http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP     http://www.openldap.org/project/

Mime
View raw message