From dev-return-28581-apmail-directory-dev-archive=directory.apache.org@directory.apache.org Thu Jan 15 12:20:50 2009 Return-Path: Delivered-To: apmail-directory-dev-archive@www.apache.org Received: (qmail 54872 invoked from network); 15 Jan 2009 12:20:50 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 15 Jan 2009 12:20:50 -0000 Received: (qmail 97140 invoked by uid 500); 15 Jan 2009 12:20:50 -0000 Delivered-To: apmail-directory-dev-archive@directory.apache.org Received: (qmail 97106 invoked by uid 500); 15 Jan 2009 12:20:49 -0000 Mailing-List: contact dev-help@directory.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Apache Directory Developers List" Delivered-To: mailing list dev@directory.apache.org Received: (qmail 97097 invoked by uid 99); 15 Jan 2009 12:20:49 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 15 Jan 2009 04:20:49 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of elecharny@gmail.com designates 209.85.198.231 as permitted sender) Received: from [209.85.198.231] (HELO rv-out-0506.google.com) (209.85.198.231) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 15 Jan 2009 12:20:40 +0000 Received: by rv-out-0506.google.com with SMTP id g37so1458041rvb.25 for ; Thu, 15 Jan 2009 04:20:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:reply-to :to:subject:in-reply-to:mime-version:content-type :content-transfer-encoding:content-disposition:references; bh=eh1COKjdbDpK/1qIYCiSYuLSBVuNFX811jHKtiowiJk=; b=jLl3iMwaqg4/yR+sUHHGk0DhC6U06OyMjuMY4/eN4PgdSF//hD5bnGN3tX8Ztep+Ko u2rirGHJshcW+RKr+nkc+gwAcKR2MAmJZ7Qft03+JgBX3Dv7z5HrgkKLZuPLTJGMDEKy iNjnOolJzTYyYYf4dyHi7zQs4vc7hEEnU0VAM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:reply-to:to:subject:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references; b=ubOHqNjk4rAxuCQqKO9H/SxP3qE9Bk7cTJCsbYizKM7UwQZODQ8/8iTAj5hUiy8jM+ Sn9KVcOTZ2agohrj7zqcClh0ldW7SM3T0leMR24GwZb0QBNSktdMRZMUOJh73+ZzgGjM oq5OvbgmMamAclJL0f2yhT1QAfgU68nzZYWIg= Received: by 10.142.242.11 with SMTP id p11mr487464wfh.297.1232022019127; Thu, 15 Jan 2009 04:20:19 -0800 (PST) Received: by 10.142.143.3 with HTTP; Thu, 15 Jan 2009 04:20:19 -0800 (PST) Message-ID: Date: Thu, 15 Jan 2009 13:20:19 +0100 From: "Emmanuel Lecharny" Reply-To: elecharny@iktek.com To: "Apache Directory Developers List" Subject: Re: [Mitosis] Push or pul, plus some random thoughts In-Reply-To: <496F243A.9249.0044.0@salfordsoftware.co.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline References: <496E6D23.9060600@nextury.com> <496E705E.3080007@symas.com> <496E832F.3050605@nextury.com> <37055FCCE79F5AA32991C01C@192.168.1.199> <496EF013.4070604@nextury.com> <496EF4F1.8080105@symas.com> <496F243A.9249.0044.0@salfordsoftware.co.uk> X-Virus-Checked: Checked by ClamAV on apache.org On Thu, Jan 15, 2009 at 12:55 PM, Martin Alderson wrote: >> the current implementation of replication, AFAICT, is based on a pull >> system. > > The current system seems more like push than pull to me, in the sense tha= t one replica decides to send modifications to another without being asked = for them. perfectly right. > > The process of A (client) replicating modifications to B (server) at the = moment is: > > 1. A connects to B. > 2. Periodically, A asks B for its current CSN vector. > 3a. If A still has enough modifications in its log to bring B up to date:= A sends B all the modifications that it has newer than B's CSN vector. > 3b. If A doesn't have enough modifications in its log to bring B up to da= te: A sends B all entries in the DIT. > 4. B applies the incoming modifications. > > > There is a problem that you mention, where B goes down and A just keeps t= rying to reconnect blindly. It would be nice if the responsibility lay wit= h B to notify all other replicas when it comes back up. yep. Otherwise, we use the current retry mechanism : a thread tries to connect to the replica, if it fails, it waits 2 secs, then 4, 8 16, 32 and 60 secs, until it gets a response back. > > >> - how many threads should we have ? A pool or one thread per replicas ? > > I would say one thread per replica is OK (I think that's what we have now= ). Ideally we would have a system where we don't need to be connected to e= very replica (i.e. if A is connected to B and B is connected to C we don't = need A to be connected to C). > > >> The connecting replica could send the last entryCSN received, and then a >> search can be done on the server for every CSN with a higher CSN. Then >> you don't need to keep any modification on disk, as they are already >> stored in the DiT. > > This would only be OK if you could guarantee that no new modifications wo= uld occur on the reconnecting replica until it has been brought back up to = date. I think the current method of just sending the modification logs wor= ks better especially when the replica was only disconnected for something l= ike a temporary network glitch. In fact, when reconnecting, the replica should indicate what was the latest CSN it received, so the server can push back the modifications from this CSN up to the latest local CSN. There are two issues with this approach : - the deleted entries. - if the replica is connected to more than one other server, it will receive a hell lot of modifications from all the connected server at the same time. > > I don't think there's a reason to have a separate list of modification lo= gs for each replica stored in the server - we can just keep the main modifi= cation logs around until all replicas have them. Right. I have overlooked this, I think. >> The way Mitosis works atm is to keep the deleted entries in the DiT with >> a added attribute telling if the entry has been deleted, so we keep them >> in the DiT ( but not available for standard operations) until all the >> replicas has been updated. So a disconnected replicas which reconnect >> will get the deleted entry info when it connect back. > > As far as I can see mitosis doesn't really use these tombstone entries. = A delete operation is stored in the modification logs which are sent to any= replica that doesn't have them yet. Right now, from the code I can read, the deleted enties are "tombstoned". Maybe we can get rid of this, as we also store the delete operation into the derby Store at this point. > >> How to handle the real deletion is the problem, as we have to keep a >> state of each replica... > > Personally I think we can stop using tombstone entries completely and jus= t rely on the modification logs. The question then is when do we purge the= modification logs? At the moment a modification log item is purged after = a certain (configurable) amount of time. It perhaps would be nicer if we k= ept modification log items around while we know other replicas still need t= hem. This would just involve storing the CSN vector for each known replica= . I rethougt about this and the problem is that we won't be able to resync a server disconnected for a too long period, unless we simply erase its full base and ask for all the entries. can be costly when you have millions of entries ! However, in this very case (let's say you keep a one week period modifications), if you get out of this window, the best would probably to restore the base from a backup (way faster than reinjecting all the entries one by one !), and then resync using the modification log. So the modification log should only contain a limited number of modifications, depending on the configured storage period. In order to get this working, we have to implement a decent DRS too (Disaster Recovery System), which is on its way, as it's just a specific implementation of the current changelog interceptor (we have to store on disk the modifications, but not the reverse modifications, as it's done with the ChangeLog mechanism). PS: I will try to sumarize all those ideas on the wiki page later. Sadly, I'm pretty busy atm, having to sweat for a client :/ Thanks ! --=20 Regards, Cordialement, Emmanuel L=E9charny www.iktek.com