directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ludovic Poitou <>
Subject Re: Substring filter : how to improve their use ?
Date Fri, 22 Feb 2013 07:59:45 GMT
Hi Emmanuel,

You might want to eval the scope after the filter rather than initially.
In most real life deployments, applications are searching for a single user, like (uid=elecharny),
as a subtree through the whole directory. The scope becomes irrelevant then.
So checking the scope first adds processing of no value.

Just my 2 cents.


Ludovic Poitou

On Thursday, February 21, 2013 at 19:14 , Emmanuel L├ęcharny wrote:

> Hi guys,
> currently, when we do a search, we try to evaluate the number of
> candidates each node in the filter expression will return. For instance,
> such a filter :
> (&(cn=test)(sn=acme))
> will evaluate the number of entries with a CN = test and the number of
> entries with a SN = acme. Let say the former has 10 entries and the last
> has 5 entries, we will annotate each node with this number :
> (&:[5](cn=test:[10])(sn=acme:[5]))
> Here, the & will have the lowest number, and at the end, we will select
> the node whith the lowest number as a base to fetch the entries.
> So far, so good (I don't ant to go any further about how we evaluate
> other operators, like | and !, it's way more complex but irrelevant here).
> Every filter will also have an additional node added : the scope. The
> scope allows us to limit the number of entries to fetch, depending on
> where in the tree we are. That can save some time.
> In our example, let's say we have 1000 entries under ou=system, 100
> under ou=users,ou=system and 3 under ou=unitTest,ou=users,ou=system. The
> filter will be transformed internally to this filter (assuming the scope
> is SUBTREE) :
> (&(scope=SUB)(cn=test)(sn=acme))
> and annoted as :
> base search = "ou=system" :
> (&:[5](scope=SUB[1000](cn=test:[10])(sn=acme:[5]))
> base search = "ou=users,ou=system" :
> (&:[5](scope=SUB[100](cn=test:[10])(sn=acme:[5]))
> base search = "ou=unitTest,ou=users,ou=system" :
> (&:[3](scope=SUB[3](cn=test:[10])(sn=acme:[5]))
> As we can see, the scope can change the evaluation. In the last case, we
> will pick the scope node to fetch the entries, as we will only get 3 of
> them.
> Ok, let's go a step further. Those filters were pretty simple to handle,
> as the underlying database know how many entries will be fetched when
> handling a filter like (cn=test).
> Substring, <=, >= filters are way different. We can't know many entries
> will be >= a key, or which match the substring. For instance, how do we
> know how many entries will match cn=te* ? We can find the first key
> matching cn=te in the index, but we will have to walk the index up (or
> down) to find the entries matching cn=tf*. That means we have acess to
> the ordering method, to build the key which comes after 'te' (ie, 'tf'),
> something which is quite difficult to do atm...
> So...
> I modified the evaluation for the substring/lesserThen/greaterThan
> indexed filters, to use a count which is different from the number of
> element in the index. Right now, the count will always be 10. Why 10 ?
> Why a fixed count ?
> - A fixed count is assuming that the user *knows* what he/she is doing
> when doing a search. Of course, some might do things like (cn=a*), which
> might return thousands of candidate, but let's say it's not likely.
> - 10 because it gives a chance that a better index can be selected.
> I don't really like the '10' value, but it works well. Keep in mind that
> it's just a lower bound. That does not mean the filter will just return
> 10 candidate max ! It's just a way for the optimizer to select the
> substring index instead of defaulting to something else with tens of
> thousands of candidates.
> May be 100 would be better...
> Keep in mind that if there are only 2 candidates that match the filter,
> we will have the filter selected with 10 as a max count, but we will not
> fetch 10 candiates, just 2 !
> I have modified the server to work this way, if you have better ideas,
> I'm listening !
> --  
> Regards,
> Cordialement,
> Emmanuel L├ęcharny
> (  

View raw message