directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel Lécharny <elecha...@apache.org>
Subject Re: Subentry cache : one step further
Date Mon, 03 Jan 2011 00:58:52 GMT
On 1/3/11 1:38 AM, Alex Karasulu wrote:
> On Mon, Jan 3, 2011 at 2:09 AM, Emmanuel Lécharny<elecharny@apache.org>wrote:
>
>> On 1/3/11 12:57 AM, Alex Karasulu wrote:
>>
>>> On Mon, Jan 3, 2011 at 1:27 AM, Emmanuel Lecharny<elecharny@gmail.com
>>>> wrote:
>>>   Hi,
>>>>
>>>>   SNIP
>>>
>>>   (I still have in mind to add an optional computation of the entries when
>>>> an
>>>> AP or a Subentry are modified, to avoid a postponed evaluation).
>>>>
>>>>
>>>>   Could you elaborate on this? I did not quite understand what you mean by
>>> "an
>>> optional computation of the entries".
>>>
>> We have three options here :
>> - the current trunk implementation modifies the impacted entries
>> immediately when a Subentry is added/removed/modified (using the
>> SubtreeSpecification). It's costly, but only when we add/remove/modify a
>> subentry.
>> - the current branch I'm working on is using a differed computation, ie the
>> entry relation to subentries is compted the first time the entry is accessed
>> (either during an addition or a modification, or when read). That means the
>> first read of an entry will imply a write on disk, the next read will be as
>> fast as a normal read. OTOH, the first read of an entry is always costly, as
>> we have to read the entry from the disk (unless it's in cache).
>> - the third option, if we don't want to impact users when adding a subentry
>> when the server is running, is to do as it's done in trunk, ie update all
>> the entries when adding a subentry. But this would be an option that can be
>> activated on the fly (by modifying th server configuration, or by sending a
>> control with the subentry operation).
>>
>> I suggest we go for option #2 atm, assuming that implementing #3 is easy
>> and won't imply a huge refactoring, as the mechanisms used to update the
>> entries is already implemented.
>>
>>
> It's up to you but IMO I don't think this option of delaying updates with
> subentry changes is really worth the complexity and it also introduces other
> serious issues. I wanted to express this thought but you seemed very
> interested in this direction so I let it be.
In fact, the complexity is equivalent. You still have to update all the 
entries, which is pretty trivial. The only difference is that when you 
grab the entry, you have to check if it has a reference on a subentry's 
UUID and the same sequence number than its parent AP. But if we want to 
spare this extra processing, then option #3 can be triggered.

OTOH, computing all the entries while we process a subentry 
addition/removal is quite complex, as we have no way to correctly handle 
a server shutdown occurring in the middle of such an operation. One more 
thing : as the operation will be costly, it's unlikely to be atomic.
> Just as a quick idea of what I was thinking. Sometimes a search with the
> right parameters pursuant to a subentry alteration affecting the selected
> region may trigger the entire area to be computed anyway making the lazy
> computation effectively the same thing as eager computation. But this time
> the computation effort is felt on a search operation. This will make our
> performance metrics tests even harder to interpret down the line as well.
This is why I suggested in one previous mail I sent two weeks ago that 
if the admin does not want to face such impact, it's easy to do a full 
search and get the full base updated immediately.
> Furthermore don't we want to know if a subentry altering operation succeeds
> when the administrator performs it? It might be nice to have the operation
> block as well so the admin knows when the operation completes so he can let
> users back on the system.
There is no need to block with option #2, as the admin will know that 
the operation has succeeded as soon as he get the response. We then have 
all the needed informations stored in the server to process any other 
operation leveraging the added Subentry :
- the AP is present
- the seqNumbers are updated in the AP
- the subentry is present
- the subentry caches are updated
> Also when we  get local transactions implemented
> in the server subentry alterations should be tied to a single atomic
> operation. If something fails for some reason or another down the line while
> making the updates don't you want to know immediately and roll back?
Someting clearly straightforward with option #2, way more complicated if 
you have thousands of entries to rollback with option #1. In fact, I 
don't want to think what impact it could have with millions of entries 
being parts of a subtreeSpecification...

Keep in mind that with the differed computation, knowing if the entry is 
associated with a subentry - or not- is just a matter of comparing the 
seqNumbers the entry has (or not) with it's AP sesqNumber, and if it's 
older, then check if the entry is part of one subtree, and update its 
references to the subentry. All those operations are done in memory, and 
don't require any disk access, except to store the updated attributes. 
And even so, if we don't write back to the disk this updated entry, it 
doesn't matter. If the server brutally stops, we simply will recompute 
those elements the next time.

All in all, what bothers me the most with option #1 is the failure 
recovery : we don't have any mechanism to restart the processing in the 
middle of it. This is not an issue with #2.

-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com


Mime
View raw message