directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel Lecharny <elecha...@gmail.com>
Subject Re: AdministrativePoint handling
Date Tue, 14 Dec 2010 09:54:31 GMT
A branch has been created :

https://svn.apache.org/repos/asf/directory/apacheds/branches/trunk-with-dependencies-AP

On 12/13/10 6:00 PM, Alex Karasulu wrote:
> On Mon, Dec 13, 2010 at 12:41 PM, Emmanuel Lecharny<elecharny@gmail.com>wrote:
>
>> Hi guys,
>>
>> I'm resuming the APs implementation. I have had a new idea this morning
>> about the best way to handle them.
>>
>> Currently, we had devised about too possible options
>>
>> 1) Processing the entries while adding a new AP
>> The idea is that when we add a new AP (or delete an existing one, or
>> modifying an existing one), we evaluate all the underlying entries and
>> update each one which is selected by the associated SubtreeSpecification by
>> adding a reference to its associated AP in the entry.
>> This is done once, then when a user grab an entry, we don't need to
>> evaluate it again, as this work has already been done, so we accept to pay
>> the price once, but it's free for every operation after this initial update.
>>
>> The price to pay is pretty expensive, if we consider a huge base with
>> hundred of thousands entries, as we have to update many, and as an update is
>> slow (we currently can process 600 per second of such updates - depending on
>> the server we are running on, of course -, so for 1 000 000 entries it might
>> take up to 30 minutes ).
>>
>> 2) Processing the evaluation on the fly
>> The second idea is to evaluate the entry when a user requests it, to avoid
>> paying the price for the initial update.
>>
>> The big problem with this approach is that if we have many APs
>> SubtreeSpecifications to evaluate, this evaluation is not free, plus it has
>> to occur *everytime* a user access the entry. That could slow down the
>> server a lot (probably double the time it takes to return an entry).
>>
>> So far, we have decider to go for option #1, simply because adding an AP is
>> an administrative task, which does not occur very often. We all realize that
>> it has drawbacks, but it's far better than option #2 anyway.
>>
>> Now, I'd like to propose a third option, which might be acceptable,
>> considering that we will pay a small price compared to both option #1 and
>> #2.
>>
>> 3) Processing on the fly and store the result
>> The idea is to process the evaluation when someone fetches an entry, but to
>> store the evaluation result in the entry. If the fetched entry has already
>> been evaluated, then no need to process it again.
>>
>> The direct benefit will be that we won't have this huge initial update
>> required by option #1, and the entry evaluation wil be just a matter to
>> check if an AT has been set instead of fully evalute the entry as in option
>> #2.
>>
>>
> This sounds like an excellent compromise.
>
> Is there a chance though of triggering a large set of these updates on for
> example a search making this approach in practice perform close to approach
> #1? Let me explain:
>
> Usually directories containing millions of entries have most of their
> entries under a small number of containers like ou=users for example.
> Directories today are flatter than in the past where there was more depth
> and hierarchy.
>
> Using a search filter without many constraints (AVAs) under a congested base
> with one or sub tree scope may trigger the need to evaluate a large set of
> entries.
>
> To think this through let me just for the record go through the way search
> interacts with the administrative model subsystem. I won't consider
> base/object scoped searches.
>
>     (1) On the way into the server a search may have some evaluations done on
>
>     (2) The search call returns a Cursor for the search request parameters
> positioned
>           at the first candidate to return. This may have actually processed
> several
>           candidates that do not comply with the filter. During candidate
> processing
>           administrative aspects will be considered, so subtree specification
> evaluation
>           will occur.
>
>     (3) There after calls to get the next candidate and so on repeat the
> steps in #2
>           until the Cursor is exhausted or search limits if in effect halt
> the search.
>
> In step #2 evaluations checking entries for inclusion in administrative
> areas defined by subtree specifications, occur only after the filter's
> constraints have been applied and the entry in question passes those
> assertions.
>
> So a search can only trigger updates on entries satisfying it's filter. Also
> this will be shorted by search limits as well for standard users
> (non-admin). If the filter is not very restrictive for example
> (objectClass=*) then this is bad but search limits should constrain the
> number of updates performed.
>
> So only a subset of the updates will take place. It does sound good going
> through the entire thought process.
>
> How will it works ? Simple : when an AP is added,
>
>
> What about when the AP is modified and deleted ?
>
>
>> a Timestamp (TS) is updated.
>
> Is this a global timestamp used for this purpose system wide?
>
>
>> This TS will be incremented every time we add a new AP.
>
> Again also when we update or delete an AP (implies subentries as well)?
>
>
>> We don't update any entry.
>> Then when a user fetch some entries, for every one of the selected entries,
>> we check if it has a TS AT present.
>
> I presume this timestamp attribute is separate from the modifyTimestamp or
> createTimestamp right?
>
> So there's a special timestamp in the entry and on somewhere in the server
> to do these TS comparisons?
>
>
>> If not, then we evaluate the entry against the AP it depends upon, and when
>> done, we add the last TS into the entry, save it and return (or not) the
>> entry to the user.
>> If the entry has a TS AT, then we have two cases
>> - The TS equals the last TS : the entry has already been evaluated, and we
>> can return it (or not, depending on the evaluation result)
>> - The TS is below, so we reevaluate the entry.
>>
>> Now, how will we distinguish between an entry evaluating to true or false ?
>> Easy : if the entry is selected by a SubtreeSpecification, we will have a
>> ref to the associated AP in it, otherwise no.
>>
>>
> So you have to update the timestamp and the ref back to the AP containing
> the subentry selecting it right?
>
>
>> I do think that this third option has great advantages, and not only from
>> the user or admin POV : the implementation will be greatly improved, as we
>> don't have to deal with all the cases when handling an AP
>> addition/modification/deletion.
>>
>>
> It's still somewhat hazy.
>
>
>> One last thing : one may object that the on the fly evaluation is not
>> acceptable, as it will cost a lot for the first access to an entry. True.
>
> I'd think it's very acceptable but does the gain outweigh the complexity is
> my question and I still don't fully understand your approach.
>
> This is a really complex part of the server and I'm hoping we can find ways
> to simplify it further.
>
> At some point you had a great idea of using indirection via UUID instead of
> a direct reference with DN back to the AP.  Are you still thinking about
> this approach. I need to dig up that email you originally had sent to the
> dev list. It was a great idea.
>
>
>> But I claim that it's the administrator to deal with this problem, and then
>> its not anymore a user issue. When the admin add an new AP, he also can
>> fetch all the entries below the added AP, and they will be updated on the
>> fly. The users won't be impacted anymore.
>>
>> thoughts ?
>>
>>
> The idea is great but I'm worried about how much more complexity it may
> introduce on top of something that already needs simplification but I guess
> experimentation will really tell you that.
>
> The best thing to do is get a clear idea of your approach and try
> implementing it in a branch after some discussion to help address all the
> issues.
>
> Best,


-- 
Regards,
Cordialement,
Emmanuel L├ęcharny
www.iktek.com


Mime
View raw message