Return-Path: Delivered-To: apmail-directory-dev-archive@www.apache.org Received: (qmail 44663 invoked from network); 20 Jan 2009 17:16:13 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 20 Jan 2009 17:16:13 -0000 Received: (qmail 45484 invoked by uid 500); 20 Jan 2009 17:16:13 -0000 Delivered-To: apmail-directory-dev-archive@directory.apache.org Received: (qmail 45447 invoked by uid 500); 20 Jan 2009 17:16:12 -0000 Mailing-List: contact dev-help@directory.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Apache Directory Developers List" Delivered-To: mailing list dev@directory.apache.org Received: (qmail 45438 invoked by uid 99); 20 Jan 2009 17:16:12 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Jan 2009 09:16:12 -0800 X-ASF-Spam-Status: No, hits=3.4 required=10.0 tests=FS_REPLICA,HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of akarasulu@gmail.com designates 209.85.200.172 as permitted sender) Received: from [209.85.200.172] (HELO wf-out-1314.google.com) (209.85.200.172) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Jan 2009 17:16:06 +0000 Received: by wf-out-1314.google.com with SMTP id 27so3786945wfd.31 for ; Tue, 20 Jan 2009 09:15:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=2FpYmPJulYX8zAUoQl9RPSlGJQ5OOmYLq0/ppBFrmNw=; b=nId9Waj2528kpKCPeUQbeABarGMi4wUteE2W+VH4kOjk62qj+JeYk7U+B7v7qzYB6K hG5kSgg2QEYCqK+TsPELV6TJjGlZJGpY70Ki/OrRsjQ/3exoUsRsq0Jo36DUymXjp9oc uxHsNDE25IIYJj1wj40SDNVQNZptObYycxsS8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=ex11IrZGdxv236U2/Zyp1jDOeVpgxpPpkGiHtDkhAZSZyTgoq7kVBDpXYGCr7WtWGM 6pH25dhJsWwB8LpD/rGKeC+pInu2VPqw36eXf6YOFwlH22MVSVrY+T3MFNQby9jQsVJ4 g13jdPhZ1azrRen6c/M3bbQiOdntOzXAiLfnE= MIME-Version: 1.0 Received: by 10.142.221.11 with SMTP id t11mr2937206wfg.335.1232471745959; Tue, 20 Jan 2009 09:15:45 -0800 (PST) In-Reply-To: References: Date: Tue, 20 Jan 2009 12:15:45 -0500 Message-ID: Subject: Re: [Replication] Handling Triggers (was: Re: [Mitosis] random thoughts ...) From: Alex Karasulu To: Apache Directory Developers List , elecharny@iktek.com Content-Type: multipart/alternative; boundary=000e0cd147109d19fb0460ed302f X-Virus-Checked: Checked by ClamAV on apache.org --000e0cd147109d19fb0460ed302f Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit On Tue, Jan 20, 2009 at 7:22 AM, Emmanuel Lecharny wrote: > On Tue, Jan 20, 2009 at 4:16 AM, Alex Karasulu > wrote: > > Hi Emmanuel, > > > > On Sat, Jan 17, 2009 at 7:15 PM, Emmanuel Lecharny > > wrote: > >> > >> Last, not least : the triggers. If some modification can triggers some > >> other (because of integrity constraints being activated), then it should > be > >> logged in the change log. When replicating, the triggers _must_ be > disabled, > >> as the merged operations will contain all the triggered operations. > > > > This is one way to handle it but it could be very expensive. If the > trigger > > firing impacts many entries or results in a cascade of firings, then the > > cost of replicating the changes could be very large. > > Even if you fire the trigger, you will have the same amount of change > to do. You just spare the checks and the logic cost. > > > > > Triggers are modeled as entries. As entries they will themselves be > > replicated. It would be nice if the trigger on a consumer could fire and > do > > all the work so we could avoid unnecessary network traffic. This is all > > nice but it gets really complicated really fast. > > Right, it would spare a hell lot of network trafic, if triggers can be > fired instead of disactivated. In order to do so, we have to add a > special attribute into each entry modified by the trigger, or even > better, use a special user (a Trigger user) and put it into the > creatorName or modifersName AT. > > > > Before going on to talk about triggers let's stop for a second and talk > > about how replication events must be handled by a consumer. The consumer > > must make sure that whatever change is to be applied to the DIT (except > for > > delete operations) must have the proper operational attributes applied. > > More specifically the following basic operational attributes need the > proper > > values: > > > > createTimestamp > > creatorsName > > modifyTimestamp > > modifiersName > > > > So the replication event should contain the who and the time at which the > > operation actually occurred rather than the current time for example. > > yep. > > Hence > > replication event processing must perform operations against the DIT with > > the identity of the client making the change at that time on the > supplier. > > So unlike a regular operation, an operation to apply replication deltas, > > must use different values for these attributes. In a way this kind of > > operation is not a direct operation against the consumer, but an indirect > > operation. > > > > Direct operations by clients may raise, triggers which may perform > > additional operations against the DIT. These triggered operation can > > themselves raise triggers that cause more changes. A cascade may result > > although should be constrained through various means. The server is > > designed to track the fact that a triggered change is occuring because of > > another change. This is tracked through a linked list where at the head > > you'll find the operation that started it all. All the triggered > operations > > are treated as indirect operations caused by the operation at the head. > > > > The point I want to make is we already have some machinery here for > tracking > > direct and indirect opertations. Although presently triggers don't work > and > > the tracking mechanism lacks a way to put the same timestamp on all > changed > > entries as if they happened at the same time, it should have this. The > > server must treat replication operations at the consumer in a similar > > fashion and apply timestamps properly. It can also do the same with > respect > > to the changes due to triggers even if the operation in question is > > replicated or not. > > > > This is the main worry with triggers and if we can properly solve this > > problem in a simple and easy to maintain way then we're golden. > > Right now, I think that the first step would be to have replication > working, Triggers or not. More specifically, if implementing a first > version of a working replication, and if it breaks triggers, then i'm > ready to pay the price : just because a server with the best possible > triggers implementation worth nothing without a working replication. > > And i'm pretty sure we will be able to whip the triggers > implementation over a working replication than trying to catch all the > balls at the same time. We have to learn how to juggle with one ball > before trying the ten balls challenge ! > Oh yes I agree completely with your approach. I started this thread for some background discussions on this specific topic while we were focusing on getting replication working period. It's obvious we just want to get something working then iron out the details. Having these discussions during this time might help us avoid certain pitfalls. Alex --000e0cd147109d19fb0460ed302f Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

On Tue, Jan 20, 2009 at 7:22 AM, Emmanue= l Lecharny <ele= charny@gmail.com> wrote:
On Tue, Jan 20, 2009 at 4:16 AM, Alex Karasulu <akarasulu@gmail.com> wrote:
> Hi Emmanuel,
>
> On Sat, Jan 17, 2009 at 7:15 PM, Emmanuel Lecharny <elecharny@gmail.com>
> wrote:
>>
>> Last, not least : the triggers. If some modification can triggers = some
>> other (because of integrity constraints being activated), then it = should be
>> logged in the change log. When replicating, the triggers _must_ be= disabled,
>> as the merged operations will contain all the triggered operations= .
>
> This is one way to handle it but it could be very expensive.  If = the trigger
> firing impacts many entries or results in a cascade of firings, then t= he
> cost of replicating the changes could be very large.

Even if you fire the trigger, you will have the same amount of change=
to do. You just spare the checks and the logic cost.

>
> Triggers are modeled as entries.  As entries they will themselves= be
> replicated.  It would be nice if the trigger on a consumer could = fire and do
> all the work so we could avoid unnecessary network traffic.  This= is all
> nice but it gets really complicated really fast.

Right, it would spare a hell lot of network trafic, if triggers can b= e
fired instead of disactivated. In order to do so, we have to add a
special attribute into each entry modified by the trigger, or even
better, use a special user (a Trigger user) and put it into the
creatorName or modifersName AT.


> Before going on to talk about triggers let's stop for a second and= talk
> about how replication events must be handled by a consumer.  The = consumer
> must make sure that whatever change is to be applied to the DIT (excep= t for
> delete operations) must have the proper operational attributes applied= .
> More specifically the following basic operational attributes need the = proper
> values:
>
> createTimestamp
> creatorsName
> modifyTimestamp
> modifiersName
>
> So the replication event should contain the who and the time at which = the
> operation actually occurred rather than the current time for example.<= br>
yep.

 Hence
> replication event processing must perform operations against the DIT w= ith
> the identity of the client making the change at that time on the suppl= ier.
> So unlike a regular operation, an operation to apply replication delta= s,
> must use different values for these attributes. In a way this kind of<= br> > operation is not a direct operation against the consumer, but an indir= ect
> operation.
>
> Direct operations by clients may raise, triggers which may perform
> additional operations against the DIT.  These triggered operation= can
> themselves raise triggers that cause more changes.  A cascade may= result
> although should be constrained through various means.  The server= is
> designed to track the fact that a triggered change is occuring because= of
> another change.  This is tracked through a linked list where at t= he head
> you'll find the operation that started it all.  All the trigg= ered operations
> are treated as indirect operations caused by the operation at the head= .
>
> The point I want to make is we already have some machinery here for tr= acking
> direct and indirect opertations.  Although presently triggers don= 't work and
> the tracking mechanism lacks a way to put the same timestamp on all ch= anged
> entries as if they happened at the same time, it should have this. &nb= sp;The
> server must treat replication operations at the consumer in a similar<= br> > fashion and apply timestamps properly.  It can also do the same w= ith respect
> to the changes due to triggers even if the operation in question is > replicated or not.
>
> This is the main worry with triggers and if we can properly solve this=
> problem in a simple and easy to maintain way then we're golden.
Right now, I think that the first step would be to have replica= tion
working, Triggers or not. More specifically, if implementing a first
version of a working replication, and if it breaks triggers, then i'm ready to pay the price : just because a server with the best possible
triggers implementation worth nothing without a working replication.

And i'm pretty sure we will be able to whip the triggers
implementation over a working replication than trying to catch all the
balls at the same time. We have to learn how to juggle with one ball
before trying the ten balls challenge !

Oh yes I agree complet= ely with your approach.  I started this thread for some background dis= cussions on this specific topic while we were focusing on getting replicati= on working period.  It's obvious we just want to get something wor= king then iron out the details.
Having these discussions during this time might help us avoid certain pitfa= lls.
 
Alex
--000e0cd147109d19fb0460ed302f--