Hi Emmanuel,

On 9/26/07, Emmanuel Lecharny <elecharny@gmail.com> wrote:
Suppose we want to play for a while on the data, here is the sequence
of operation I'm thinking about :
1) start a 'Mark' (I prefer using 'Mark' instead of 'Transaction', for
semantic reasons)

Call it a revision or a tag.  I want to use SVN like names here instead of making up our own since
people understand SVN concepts and really this mechanism will be very similar.

3) inject whatever we want into the server (add, del, modify...)
4) do a rollback (the anti-operations are committed)

At the end, the server will be in a consistent state.

Well this is not a matter of consistency perhaps.  You will roll back to an earlier
state but the revision number will increase similar to the way you
merge back to an earlier state in SVN but then commit forward.  Example:

    revisions: 0 1 2 3 4 5 6 7 8
    At rev 8 you want to rollback to state at rev 5 then commit forward to revision 9

In this case SVN applies a series a diffs from is commit backwards.  This is very similar
to what we will be doing.  Instead we apply LDIFs and our revisions will increase by 3 in
the example since there are 3 reverse ldifs to apply.

This leads to some tricky points :
1) Some partitions may be treated differently (schema ?) : we may need
a level of protection

Ok you want to scope out different changes here so you can only apply those that
you want.  Again this is easy to do once we have query capabilities on the log.  We
will be able to ask it give me all the changes that took place under a DN on the following
attributes etc.  This way you can pick the things you want to revert in a particular subtree.
This is going to be very powerful.

Does this answer the question?

2) The logger will become a bottleneck, as we will have to synchornize
the concurrent access to the storage

Well yes this is true.  I don't know how to avoid this safely.  You can asynchronously
send messages to the log as it writes it back out but this is dangerous since you can
loose data.

3) if we want to rollback the operations, the server should not be
able to process any other operations until the rollback is done

Don't know if this is absolutely true.  As long as the operation does not conflict with the changes
we should be ok.  We can also quickly determine which subtrees or entries are effected by a rollback
and check fast to see if an operation is going to be in conflict. 

If we seriously consider using this mechanism for something more
critical, like storing a journal we can replay on crash (usefull with
the differed write mechanism we have), then other elements come into
the play :

This overload the purpose past revision control and tries to us the change log
as a transaction log.  This is a different function with different requirements. 
Perhaps we can do both but we should not overload it at this point or else we
cannot get anywhere with this feature.

1) we must flush the data as soon as they arrive, on disk

yep this is key for a txn log or else you loose the data

2) we have to think about a recover mechanism, which should compare
the current database state and the current journal

With revisions this is easy to do.

2-1) this recover mechanism will have to know which data has been
flushed to the backend, otherwise we may have a difference between the
journal and the backend. Namely, the Sync thread should be driven by
the ChangeLog interceptor (when the commit is submitted, then the
synch thread is waken up and flush data on the backend, marking the
entries in the log when they are written)

Again I think this is an incredibly bad move to mix these two concerns together.  We need a
separate transaction log or need to leverage the one that exists in backing stores.  I prefer our
own transaction log but this should be a separate subsystem all together maybe based on HOWL.

Then the two subsystems can compliment each other.  Keep it simple without overloading
functionality on one subsystem so we can actually get work done rapidly and maintain them
better.

In any case, we also need a mechnaism to activate the ChangeLog operations :
- startLog

The log can be started and stopped but not if it's also a transaction log so let's not mix
these concerns and mess up both of them.

- beginMark

Ok using svn language: this is a tag.  You can tag several revisions that are of significance.  So it's not one
tag (mark) but for the testing situation you have to deal with then yes you will get the current revision of the server
before starting a test then roll back to that revision after the test is complete.  But I recommend just turning
on the change log at the end of setUp and turning it off on tearDown then just applying the reverse.ldif .

- commit

Right now we don't commit several operations in one.  We do not have transactions.

- rollback
- stopLog

Ok we are mixing lots of things here.  I think we're going to get lost in the woods.

What would be the best solution ?

I would take things one step at a time because I get confused easily.  The first step should just be to build
the simplest implementation to capture the changes and produce a forward and reverse diff log.  That's the
first step and we can use this immediately with the test rollback requirements we have.

We can do that with a specific
control, an extendedOperation or a standard modifyRequest of a
specific entry in the ou=system partition (remember the 'configuration
in the DIT' thing ?) coupled with a trigger and SP.

Oh man this is a separate thread on it's own.  For now I suggest we take things simply and progress
and just start to flush out the issues with time while solving some immediate concerns.

Let's divide and conquer the problems so they're not so overwhelming.  If we try to solve every problem
all at once then we cannot start and finish something that can give us value.

Alex