Mailing-List: contact dev-help@directory.apache.org; run by ezmlm
Precedence: bulk
Reply-To: "Apache Directory Developers List" <dev@directory.apache.org>
Received-SPF: pass (nike.apache.org: domain of akarasulu@gmail.com designates
 209.85.220.15 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type;
        b=cJKJbA1n6eXaVrg5BtFhSM1Thh7volmx0+qikJ5It+G64hc9j3ha1Ql9wdeg2r/GrO
         J8faUn2OtUunF5vp244PEyVwQFaWD6oO9+McJy+LQ/vfpR6ZLGzm6L/iwWoH1mxG6Rc2
         Eya24MKYeJxdvBlsup6Pgso3uNAvb2ZlLaB4k=
MIME-Version: 1.0
In-Reply-To: <4984270D.4070900@nextury.com>
References: <a32f6b020901301843p46bfc799u89f1e1893e1531ee@mail.gmail.com>
	 <4984270D.4070900@nextury.com>
Date: Sat, 31 Jan 2009 11:12:23 -0500
Message-ID: <a32f6b020901310812k73833259jaee2434e9c51e3d5@mail.gmail.com>
Subject: Re: [ApacheDS] Implementing isolation using multi-version concurrency
	control (MVCC)
From: Alex Karasulu <akarasulu@gmail.com>
To: Apache Directory Developers List <dev@directory.apache.org>
Content-Type: multipart/alternative; boundary=001636c5a4903e6fd60461c996d6

--001636c5a4903e6fd60461c996d6
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

On Sat, Jan 31, 2009 at 5:25 AM, Emmanuel Lecharny <elecharny@apache.org>wrote:

> Alex Karasulu wrote:
>
>> Hi all,
>>
>> Emmanuel and I were having an interesting conversation about the kinds of
>> transaction properties needed by ApacheDS to comply with various
>> specification requirements. For example all LDAP operations must be atomic
>> and must preserve consistency. Furthermore, one can easily debate the need
>> for isolation where any operation does not see or get impacted by the
>> partial changes of another concurrent operation.
>>
>> We started discussing these ACID properties and ways in which they could
>> be
>> implemented. Isolation is very interesting now thanks to directory
>> versioning and the change log. We can easily implement isolation now. When
>> a
>> relatively lengthy operation like a search is being conducted, it should
>> not
>> see modifications within scope that occur after the search began. The
>> search
>> operation in the example can be replaced with any other operation minus
>> all
>> unbind, some extended, and all abandon requests.
>>
>>
> Atomicity and Isolation are both complex to guarantee in a LDAP server.
>
> If we think about Atomicity, for instance, even if we can guarantee it
> somehow for somesimple operation like Modify, Add or Delete, for the ModDN
> operation is not that simple. We have to guarantee that all the potential
> renames are done - or reverted - as a whole.


Yep. I was thinking about this exact example and how the modifyDn operation
could potentially change a lot of descendants.


> As this operation can impact a big part of the server, and could take
> several seconds (minutes, hours, dependening on the number of entries), this
> is obviously not trivial. However, moddn operation aren't the most frequent
> one. Regarding the most simpler operation (add, delete and modify), I think
> we should implement some kind of "transaction" in the backend : the modified
> entry has to be tagged as 'under modification' until the backend has updated
> correctly the modification (or rollbacked it).


Well I'd like to avoid using the partition to implement something like this
since it is not a generalized solution.  When this capability is pushed into
a partition then all implementations will need to implement it their own
way. Partitions are already complicated to implement so adding this
requirement on top can make it overwhelming.  We want people to be able to
make partitions for whatever data they want to present through LDAP to
create a rich environment for choosing different partitions.

For example JDBM offers transactions. We could have piggy backed on this
feature to provide ACID properties under JDBM. Their are some quirks though
with how JDBM does this but it can be leveraged.  I decided a while back
while implementing the partition not to do this because I wanted the
transaction management to be handled above the partitions.


> Then we can untag the entry, and it's available back. How the CL can come
> into play here is to be discussed. IMHO, the CL and this 'transaction' will
> work hand to hand at some point.
>

Yes although I don't know how yet I agree that the CL is key here.  Perhaps
we need to build a transaction manager on top of the CL that handles and
tracks these things as well as manages the transactions. The tracking of a
transaction context, it's commit, or rollback is all up to a transaction
manager.  The CL is just there for history and logging and is something the
transaction manager uses as a tool to do it's job.


>
> Regarding isolation, it's a bit more difficult, as a search can already
> have sent back some results which could be change by another modify
> operation. This is especially the case for a ModDN operation.
>
>> The change log, not only tracks each change, but it allows the directory
>> server to inject the "revisions" attribute into entries. The revisions
>> attribute is multi-valued and lists all the revisions of changes which
>> altered the entry. For the search example, we can conduct the search
>> operation while locking it down to a revision.
>>
> That does not work for deleted entries, obviously ...
>

Sure it can work for deletes.  You must have wrote this without reading
further.  Basically according to the change log, any entries with changes
after time S when search began are evaluated to see if they should be
included or removed from the result set returned by the search operation.
You don't just let search produce candidates returning all and forgetting
about the changes that occurred after the search began.


>
>   This is best implemented by
>> conditionally filtering out or injecting candidates with revisions greater
>> than the revision at which the search operation started. Let's call the
>> revision when search started S. So entries in the server which posses
>> revision numbers greater than S need further evaluation. We have to
>> evaluate
>> if the filter matches these entries with revisions > S when their state
>> was
>> at revision S. This may require some reverse patching and re-evaluation of
>> the filter on the patched entry in state S.  This is not so bad because
>> there usually are not that many changes taking place at the same time on
>> the
>> same entry: meaning the number of LDIF's to patch on an entry to evaluate
>> in
>> it's former state at S will be small. This way we effectively lock down
>> the
>> search to a specific revision, giving the search operation what appears to
>> be a snapshot of the DIT. The search results will not be impacted by any
>> concurrent changes.
>>
>>
> Well, I don't think this is the best approach. In any case, a Ldap Search
> is considered as a dirty read. We have no way to lock down the modification
> on the read entries. So the user has _no_ guarantee that the entry he gets
> back will be valid. Usually, it doesn't matter, because the ratio of read vs
> writes on a LDAP server is just so big that we consider we don't have
> modifications. So we can simply return the latest revision, whatever it is.
> Anyway, there is another aspect we have to consider : once the user gets his
> entry, and before he uses it, before potentially send it back as a
> modification to the server, the very same entry can have been modified in
> between. As we don't lock entries, we can't protect the users from such a
> case.


The point is, the search should be conducted as if it were performed on a
snapshot of the DIT at the time the search was issued.  Now if you get dirty
reads then you can have inconsistent views of the information which is rare
yes but possible.

The beauty of this approach is that it allows us to search the DIT at an
time not just when the search occurred (time S as stated above).  This
actually allows us to for example to implement a "search in the past"
control where the user can conduct a search on the server at an earlier
time.  This is insanely powerful in terms of being able to have versioned
views in the server of information.  Say you have some application's
configuration in the DIT and you locked the appilcation down to read at some
time/rev associated with a release date for the application.  So the
application pulls it's configs from the DS with this control. Then you
continue making changes to the configuration data as you change the
application in developement. You can setup and run both the production
application and the development application but using different revisions
when performing the lookups.

The power this provides for configuration management is awesome to say the
least.  Now I am digressing.


>
>
> <digression>
> We have to remember here that we are not dealing with bank accounts and
> balances, or nuclear plants. Most of the case, we are using a LDAP server to
> handle identities. They rarely change, or when they do, it's because the
> person owning this identity is changing his own identity - thus limitating
> the odds that he is using it at the same time -. Usually, if we think about
> authorizations, which are subject to way more changes that identities, we
> can't consider it as a continuous flow of modification.


Yes yes I think you know that I would know this :-). However we're just
thinking about these features.  We have to know how much of a bitch it is by
going into the details then stepping back and looking at the big picture.
Finished reading the digression and I think we've told each other this
several times before in the past.

I used to think this way a lot but there are several things that have
impacted my views.  For example we need atomicity period which is a bigger
deal with moddn.  Also with triggers we need atomicity badly too.  These are
protocol requirements and there's no arguing with that. To do all this we're
going to need a transaction manager that can begin, commit or rollback sets
of changes to the DIT.


>
>
> In other words, creations or deletion of entries might be frequent,
> modification should be quite rare.


In today's day and age directories are being used in a myriad of ways -
especially after the identity buzz. I don't think we can be as comfortable
generalizing the scenario to expect as much as we used to.


>
>
> I have gathered some stats from some of my clients, and the ration
> changes/reads is like 1/5000... I would be interested to get more numbers
> here !
>
> Last, not least, I consider that if the ratio goes up to something like
> 1/100, then it's time to consider using a transactional system, namely, a
> RDBMS.


Yeah stats are good but we need a lot of them from different industries to
understand this picture.


>
> </digression>
>
> So far, I'm not saying that it's wrong to think about using a MVCC system,
> but I'm a bit sceptic about the gain in term of isolation (the I in ACID) it
> will offer in our case.
>

We're doing it somewhat today just not going all the way. This whole
versioning facility is the basis for it.  I don't think I understand why
you're skeptical. I hope to explore how this sense is coming to you because
maybe you have a seriously important reason behind it and just are not
vocalizing it.

Thanks,
Alex

--001636c5a4903e6fd60461c996d6
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On Sat, Jan 31, 2009 at 5:25 AM, Emmanuel Lecharny <span dir=3D"ltr">&lt;<a=
 href=3D"mailto:elecharny@apache.org">elecharny@apache.org</a>&gt;</span> w=
rote:<br><div class=3D"gmail_quote"><blockquote class=3D"gmail_quote" style=
=3D"border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; p=
adding-left: 1ex;">
<div class=3D"Ih2E3d">Alex Karasulu wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"border-left: 1px solid rgb(204, =
204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Hi all,<br>
<br>
Emmanuel and I were having an interesting conversation about the kinds of<b=
r>
transaction properties needed by ApacheDS to comply with various<br>
specification requirements. For example all LDAP operations must be atomic<=
br>
and must preserve consistency. Furthermore, one can easily debate the need<=
br>
for isolation where any operation does not see or get impacted by the<br>
partial changes of another concurrent operation.<br>
<br>
We started discussing these ACID properties and ways in which they could be=
<br>
implemented. Isolation is very interesting now thanks to directory<br>
versioning and the change log. We can easily implement isolation now. When =
a<br>
relatively lengthy operation like a search is being conducted, it should no=
t<br>
see modifications within scope that occur after the search began. The searc=
h<br>
operation in the example can be replaced with any other operation minus all=
<br>
unbind, some extended, and all abandon requests.<br>
 &nbsp;<br>
</blockquote></div>
Atomicity and Isolation are both complex to guarantee in a LDAP server.<br>
<br>
If we think about Atomicity, for instance, even if we can guarantee it some=
how for somesimple operation like Modify, Add or Delete, for the ModDN oper=
ation is not that simple. We have to guarantee that all the potential renam=
es are done - or reverted - as a whole. </blockquote>
<div><br>Yep. I was thinking about this exact example and how the modifyDn =
operation could potentially change a lot of descendants.<br>&nbsp;</div><bl=
ockquote class=3D"gmail_quote" style=3D"border-left: 1px solid rgb(204, 204=
, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
As this operation can impact a big part of the server, and could take sever=
al seconds (minutes, hours, dependening on the number of entries), this is =
obviously not trivial. However, moddn operation aren&#39;t the most frequen=
t one. Regarding the most simpler operation (add, delete and modify), I thi=
nk we should implement some kind of &quot;transaction&quot; in the backend =
: the modified entry has to be tagged as &#39;under modification&#39; until=
 the backend has updated correctly the modification (or rollbacked it). </b=
lockquote>
<div><br>Well I&#39;d like to avoid using the partition to implement someth=
ing like this since it is not a generalized solution.&nbsp; When this capab=
ility is pushed into a partition then all implementations will need to impl=
ement it their own way. Partitions are already complicated to implement so =
adding this requirement on top can make it overwhelming.&nbsp; We want peop=
le to be able to make partitions for whatever data they want to present thr=
ough LDAP to create a rich environment for choosing different partitions.<b=
r>
<br>For example JDBM offers transactions. We could have piggy backed on thi=
s feature to provide ACID properties under JDBM. Their are some quirks thou=
gh with how JDBM does this but it can be leveraged.&nbsp; I decided a while=
 back while implementing the partition not to do this because I wanted the =
transaction management to be handled above the partitions.<br>
&nbsp;</div><blockquote class=3D"gmail_quote" style=3D"border-left: 1px sol=
id rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Then =
we can untag the entry, and it&#39;s available back. How the CL can come in=
to play here is to be discussed. IMHO, the CL and this &#39;transaction&#39=
; will work hand to hand at some point.<br>

</blockquote><div><br>Yes although I don&#39;t know how yet I agree that th=
e CL is key here.&nbsp; Perhaps we need to build a transaction manager on t=
op of the CL that handles and tracks these things as well as manages the tr=
ansactions. The tracking of a transaction context, it&#39;s commit, or roll=
back is all up to a transaction manager.&nbsp; The CL is just there for his=
tory and logging and is something the transaction manager uses as a tool to=
 do it&#39;s job.<br>
&nbsp;</div><blockquote class=3D"gmail_quote" style=3D"border-left: 1px sol=
id rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><br>
Regarding isolation, it&#39;s a bit more difficult, as a search can already=
 have sent back some results which could be change by another modify operat=
ion. This is especially the case for a ModDN operation.<div class=3D"Ih2E3d=
">
<br>
<blockquote class=3D"gmail_quote" style=3D"border-left: 1px solid rgb(204, =
204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
The change log, not only tracks each change, but it allows the directory<br=
>
server to inject the &quot;revisions&quot; attribute into entries. The revi=
sions<br>
attribute is multi-valued and lists all the revisions of changes which<br>
altered the entry. For the search example, we can conduct the search<br>
operation while locking it down to a revision. <br>
</blockquote></div>
That does not work for deleted entries, obviously ...<div class=3D"Ih2E3d">=
</div></blockquote><div><br>Sure it can work for deletes.&nbsp; You must ha=
ve wrote this without reading further.&nbsp; Basically according to the cha=
nge log, any entries with changes after time S when search began are evalua=
ted to see if they should be included or removed from the result set return=
ed by the search operation.&nbsp; You don&#39;t just let search produce can=
didates returning all and forgetting about the changes that occurred after =
the search began.<br>
&nbsp;</div><blockquote class=3D"gmail_quote" style=3D"border-left: 1px sol=
id rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div =
class=3D"Ih2E3d"><br>
<blockquote class=3D"gmail_quote" style=3D"border-left: 1px solid rgb(204, =
204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
&nbsp;This is best implemented by<br>
conditionally filtering out or injecting candidates with revisions greater<=
br>
than the revision at which the search operation started. Let&#39;s call the=
<br>
revision when search started S. So entries in the server which posses<br>
revision numbers greater than S need further evaluation. We have to evaluat=
e<br>
if the filter matches these entries with revisions &gt; S when their state =
was<br>
at revision S. This may require some reverse patching and re-evaluation of<=
br>
the filter on the patched entry in state S. &nbsp;This is not so bad becaus=
e<br>
there usually are not that many changes taking place at the same time on th=
e<br>
same entry: meaning the number of LDIF&#39;s to patch on an entry to evalua=
te in<br>
it&#39;s former state at S will be small. This way we effectively lock down=
 the<br>
search to a specific revision, giving the search operation what appears to<=
br>
be a snapshot of the DIT. The search results will not be impacted by any<br=
>
concurrent changes.<br>
 &nbsp;<br>
</blockquote></div>
Well, I don&#39;t think this is the best approach. In any case, a Ldap Sear=
ch is considered as a dirty read. We have no way to lock down the modificat=
ion on the read entries. So the user has _no_ guarantee that the entry he g=
ets back will be valid. Usually, it doesn&#39;t matter, because the ratio o=
f read vs writes on a LDAP server is just so big that we consider we don=
9;t have modifications. So we can simply return the latest revision, whatev=
er it is. Anyway, there is another aspect we have to consider : once the us=
er gets his entry, and before he uses it, before potentially send it back a=
s a modification to the server, the very same entry can have been modified =
in between. As we don&#39;t lock entries, we can&#39;t protect the users fr=
om such a case.</blockquote>
<div><br>The point is, the search should be conducted as if it were perform=
ed on a snapshot of the DIT at the time the search was issued.&nbsp; Now if=
 you get dirty reads then you can have inconsistent views of the informatio=
n which is rare yes but possible.<br>
<br>The beauty of this approach is that it allows us to search the DIT at a=
n time not just when the search occurred (time S as stated above).&nbsp; Th=
is actually allows us to for example to implement a &quot;search in the pas=
t&quot; control where the user can conduct a search on the server at an ear=
lier time.&nbsp; This is insanely powerful in terms of being able to have v=
ersioned views in the server of information.&nbsp; Say you have some applic=
ation&#39;s configuration in the DIT and you locked the appilcation down to=
 read at some time/rev associated with a release date for the application.&=
nbsp; So the application pulls it&#39;s configs from the DS with this contr=
ol. Then you continue making changes to the configuration data as you chang=
e the application in developement. You can setup and run both the productio=
n application and the development application but using different revisions=
 when performing the lookups.&nbsp; <br>
<br>The power this provides for configuration management is awesome to say =
the least.&nbsp; Now I am digressing.<br>&nbsp;<br></div><blockquote class=
=3D"gmail_quote" style=3D"border-left: 1px solid rgb(204, 204, 204); margin=
: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<br>
<br>
&lt;digression&gt;<br>
We have to remember here that we are not dealing with bank accounts and bal=
ances, or nuclear plants. Most of the case, we are using a LDAP server to h=
andle identities. They rarely change, or when they do, it&#39;s because the=
 person owning this identity is changing his own identity - thus limitating=
 the odds that he is using it at the same time -. Usually, if we think abou=
t authorizations, which are subject to way more changes that identities, we=
 can&#39;t consider it as a continuous flow of modification.</blockquote>
<div><br>Yes yes I think you know that I would know this :-). However we=
9;re just thinking about these features.&nbsp; We have to know how much of =
a bitch it is by going into the details then stepping back and looking at t=
he big picture.&nbsp; Finished reading the digression and I think we&#39;ve=
 told each other this several times before in the past.<br>
<br>I used to think this way a lot but there are several things that have i=
mpacted my views.&nbsp; For example we need atomicity period which is a big=
ger deal with moddn.&nbsp; Also with triggers we need atomicity badly too.&=
nbsp; These are protocol requirements and there&#39;s no arguing with that.=
 To do all this we&#39;re going to need a transaction manager that can begi=
n, commit or rollback sets of changes to the DIT.<br>
&nbsp;</div><blockquote class=3D"gmail_quote" style=3D"border-left: 1px sol=
id rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><br>
<br>
In other words, creations or deletion of entries might be frequent, modific=
ation should be quite rare.</blockquote><div><br>In today&#39;s day and age=
 directories are being used in a myriad of ways - especially after the iden=
tity buzz. I don&#39;t think we can be as comfortable generalizing the scen=
ario to expect as much as we used to.<br>
&nbsp;</div><blockquote class=3D"gmail_quote" style=3D"border-left: 1px sol=
id rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><br>
<br>
I have gathered some stats from some of my clients, and the ration changes/=
reads is like 1/5000... I would be interested to get more numbers here !<br=
>
<br>
Last, not least, I consider that if the ratio goes up to something like 1/1=
00, then it&#39;s time to consider using a transactional system, namely, a =
RDBMS.</blockquote><div><br>Yeah stats are good but we need a lot of them f=
rom different industries to understand this picture.<br>
&nbsp;<br></div><blockquote class=3D"gmail_quote" style=3D"border-left: 1px=
 solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><=
br>
&lt;/digression&gt;<br>
<br>
So far, I&#39;m not saying that it&#39;s wrong to think about using a MVCC =
system, but I&#39;m a bit sceptic about the gain in term of isolation (the =
I in ACID) it will offer in our case.<br>
</blockquote></div><br>We&#39;re doing it somewhat today just not going all=
 the way. This whole versioning facility is the basis for it.&nbsp; I don&#=
39;t think I understand why you&#39;re skeptical. I hope to explore how thi=
s sense is coming to you because maybe you have a seriously important reaso=
n behind it and just are not vocalizing it.<br>
<br>Thanks,<br>Alex<br><br><br><br>

--001636c5a4903e6fd60461c996d6--