directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kiran Ayyagari <kayyag...@apache.org>
Subject Re: Substring filter : how to improve their use ?
Date Fri, 22 Feb 2013 04:04:59 GMT
On Thu, Feb 21, 2013 at 11:44 PM, Emmanuel L├ęcharny <elecharny@gmail.com>wrote:

> Hi guys,
>
> currently, when we do a search, we try to evaluate the number of
> candidates each node in the filter expression will return. For instance,
> such a filter :
>
> (&(cn=test)(sn=acme))
>
> will evaluate the number of entries with a CN = test and the number of
> entries with a SN = acme. Let say the former has 10 entries and the last
> has 5 entries, we will annotate each node with this number :
>
>
> (&:[5](cn=test:[10])(sn=acme:[5]))
>
> Here, the & will have the lowest number, and at the end, we will select
> the node whith the lowest number as a base to fetch the entries.
>
> So far, so good (I don't ant to go any further about how we evaluate
> other operators, like | and !, it's way more complex but irrelevant here).
>
> Every filter will also have an additional node added : the scope. The
> scope allows us to limit the number of entries to fetch, depending on
> where in the tree we are. That can save some time.
>
> In our example, let's say we have 1000 entries under ou=system, 100
> under ou=users,ou=system and 3 under ou=unitTest,ou=users,ou=system. The
> filter will be transformed internally to this filter (assuming the scope
> is SUBTREE) :
>
> (&(scope=SUB)(cn=test)(sn=acme))
>
> and annoted as :
>
>
> base search = "ou=system" :
> (&:[5](scope=SUB[1000](cn=test:[10])(sn=acme:[5]))
>
> base search = "ou=users,ou=system" :
> (&:[5](scope=SUB[100](cn=test:[10])(sn=acme:[5]))
>
> base search = "ou=unitTest,ou=users,ou=system" :
> (&:[3](scope=SUB[3](cn=test:[10])(sn=acme:[5]))
>
> As we can see, the scope can change the evaluation. In the last case, we
> will pick the scope node to fetch the entries, as we will only get 3 of
> them.
>
> Ok, let's go a step further. Those filters were pretty simple to handle,
> as the underlying database know how many entries will be fetched when
> handling a filter like (cn=test).
>
> Substring, <=, >= filters are way different. We can't know many entries
> will be >= a key, or which match the substring. For instance, how do we
> know how many entries will match cn=te* ? We can find the first key
> matching cn=te in the index, but we will have to walk the index up (or
> down) to find the entries matching cn=tf*. That means we have acess to
> the ordering method, to build the key which comes after 'te' (ie, 'tf'),
> something which is quite difficult to do atm...
>
> so a filter like (&(cn=te*)(sn=ac*)) with SUBTREE scope will transform to
something like

(&:[3](scope=SUB[3](cn=te*:[10])(sn=ac*:[10]))

is that correct?


So...
>
> I modified the evaluation for the substring/lesserThen/greaterThan
> indexed filters, to use a count which is different from the number of
> element in the index. Right now, the count will always be 10. Why 10 ?
> Why a fixed count ?
>
> - A fixed count is assuming that the user *knows* what he/she is doing
> when doing a search. Of course, some might do things like (cn=a*), which
> might return thousands of candidate, but let's say it's not likely.
> - 10 because it gives a chance that a better index can be selected.
>
> I don't really like the '10' value, but it works well. Keep in mind that
> it's just a lower bound. That does not mean the filter will just return
> 10 candidate max ! It's just a way for the optimizer to select the
> substring index instead of defaulting to something else with tens of
> thousands of candidates.
> May be 100 would be better...
>
> Keep in mind that if there are only 2 candidates that match the filter,
> we will have the filter selected with 10 as a max count, but we will not
> fetch 10 candiates, just 2 !
>
> I have modified the server to work this way, if you have better ideas,
> I'm listening !
>
> --
> Regards,
> Cordialement,
> Emmanuel L├ęcharny
> www.iktek.com
>
>


-- 
Kiran Ayyagari
http://keydap.com

Mime
View raw message