directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel Lecharny <elecha...@gmail.com>
Subject Re: [ApacheDS] OR expression optimizations for Cursors and Evaluators
Date Fri, 28 Mar 2008 11:06:31 GMT
Ok, I've missed it too :)

While discussing with Pierre-Arnaud yesturday during our lunch, we have 
had quite an interesting idea. Here it is :

in a filter, we have many assertions, including the scope assertion, 
which is not explicit (for those who don't know what I'm talking about, 
the scope is the subtree on which we do a search. This is the baseDN for 
the search, combined with the level (entry, child or subtree)).

For instance, searching for all the entries for an java programmer in 
florida  being engineer, the filter is :
(&(l=jacksonville)(ou=engineering))

Considering we are searching in the US, the scope could be 
c=US,dc=acme,dc=com, we will limit the search in this branch. This is 
done by adding internally a new assertion in the filter :

(&(ou=java)(ou=engineering)(scope.subtree="c=US,dc=acme,dc=com"))

<note>
This is not exactly how it is done in the server, it's just a 10K feet 
presentation of the principles
</note>

Now, the search engine do a computation on this filter to determinate 
which index should be used. Suppose that the ou=java links to 10000 
persons, and that the ou=engineering links to 1000 persons, we will use 
the ou=engineering attribute to limit the number of candidate to bring 
back from the backend, and check with the two other assertions to 
eliminate the bad candidates. We do that by enumerating the number of 
elements associated with eac assertion :

(&(10000:ou=java)(1000:ou=engineering)(MAX_INT:scope.subtree="c=US,dc=acme,dc=com"))

Currently, the scope assertion is just used to eliminate the candidates 
which are found by the search engine, we don't 'count' the number of 
candidates under a position on the tree (so the MAX_INT value). In our 
sample, if we suppose that the scope is containing 100 entries, we 
immediatly see that we are not using the correct assertion to bring back 
the entries from the backend.

Moreover if we apply the scope to the 'ou' index, we may see that we 
have only 10 java coders under the c=US,dc=acme,dc=com branch, instead 
of 10000 (not likely, but who knows  ? :)

So the idea we had was to use the hierarchy index to add some 
information about the other index. Here, we may have the number of java 
coder in the c=US,dc=acme,dc=com branch stored under the hierarchy index 
for the key 'c=US,dc=acme,dc=com'. We store all the counst for each 
index into this Hierarcy index.
As it will be modified when we modify the other index, this might cost a 
bit on modifications, deletions and additions, but this will have a huge 
impact on the search requests, which is what we target.

In our example, if we suppose that US has only 10 java coders, the 
annotated filter will be :

(&(10:ou=java)(200:ou=engineering)(500:scope.subtree="c=US,dc=acme,dc=com"))

assuming that we also have 200 engineering in the branch, and that the 
scope will bring 500 entries.

Now, we will just bring back 10 candidates from the backend instead of 1000.

This is really a gross example, but I think it gives the idea.

Is it totally insane, or does it makes some sense ?

Thanks !

Alex Karasulu wrote:
> As I try to implement this I am beginning to see some serious issues 
> with the heuristic cited in this document.  The problem is 
> sub-expression Cursors in the OR Cursor can be on anything: a logical 
> expression or on an attribute Index for a simple leaf assertion.  When 
> on an index, the results of the Cursor are sorted by the key (value of 
> the attribute) of the index and not by ID.
>
> The children of an OR Cursor can be on any attribute index.  This 
> means the Cursor results are not sorted by id as supposed by this 
> heuristic.  Instead they're sorted by attribute value.  This will lead 
> to inconsistent results.
>
> This heuristic is useless :(.  What a waste!  It would have been so 
> efficient.
>
> Alex
>
> On Thu, Mar 27, 2008 at 11:31 AM, Alex Karasulu <akarasulu@apache.org 
> <mailto:akarasulu@apache.org>> wrote:
>
>     On Thu, Mar 27, 2008 at 3:59 AM, Emmanuel Lecharny
>     <elecharny@gmail.com <mailto:elecharny@gmail.com>> wrote:
>
>         Hi Alex,
>
>
>         Alex Karasulu wrote:
>         >
>         >
>         http://cwiki.apache.org/confluence/display/DIRxSRVx11/Cursors+and+Evaluators+on+Logical+Disjunctions
>
>         I have read and checked the wiki page. I have nothing to add
>
>
>     Cool.  BTW I documented that sweet optimization you cited a week
>     ago for rapid shorting of child expression evaluations.  I think
>     this will be a significant optimization.
>      
>
>         FYI, I'm slowly coming back but I still have some side things
>         to deal with.
>
>
>     No worries.  BTW I'm thinking we can just do a quick ADS release
>     with only the top handful of issues that are critical for this
>     1.5.2 feature release.
>
>     Alex
>
>


-- 
--
cordialement, regards,
Emmanuel L├ęcharny
www.iktek.com
directory.apache.org



Mime
View raw message