directory-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <Carlo.Acco...@ibs-ag.com>
Subject RE: Is it faster/better to include one objectclass or all in query?
Date Wed, 14 Mar 2012 16:14:02 GMT
Alex - Thank you for your detailed description of the search algorithm. This is most helpful.



Regards,
Carlo Accorsi

-----Original Message-----
From: akarasulu@gmail.com [mailto:akarasulu@gmail.com] On Behalf Of Alex Karasulu
Sent: Wednesday, March 14, 2012 12:08 PM
To: users@directory.apache.org
Subject: Re: Is it faster/better to include one objectclass or all in query?

On Wed, Mar 14, 2012 at 4:51 PM, <Carlo.Accorsi@ibs-ag.com> wrote:

> Hi, when searching for a user having this objectclass hierarchy
>
> top
>  |_person
>         |_organizationalPerson
>                    |_inetOrgPerson
>
> and uid = 'jsmith'
>
> Which query would be less expensive or better/faster?  Thanks!
>
> (&
>   (objectclass=inetOrgPerson)
>   (uid=jsmith)
> )
>

This would be faster and more efficient since the evaluation is on a more specific objectClass
which reduces the search space from the get go.

To understand this you need to know about how the optimizer works with scan counts that are
returned. LDAP search filters are expanded out into an AST (abstract syntax tree) with the
leaves of the tree being assertions the branch nodes being operators. Then the optimizer annotates
this AST with scan counts, which basically is asking each index, "Hey how many results would
you return for this assertion?" So the more specific inetOrgPerson is more likely to return
a smaller scan count.

Now if you have an index on uid then the scan count on this will be 1 since UID should be
unique (our DSA does not enforce this tho). Once the optimizer is done annotating, then a
leaf node is selected in the entire AST to act as the candidate generator and is used for
iterations. The leaf node with the smallest scan count is selected for this. The driving reason
for this is that it is cheaper to iterate and lookup on less than it is more candidates. The
rest of the leaf assertion nodes are used by lookup based assertion evaluators. So in this
case with a uid index you will use this uid=jsmith to return one candidate and then do a lookup
to see if the returned candidates are also matched by objectClass=inetOrgPerson. In this case
I would just use (uid=jsmith) since you have the uid index. It will prevent the need for another
lookup to check if it's an inetOrgPerson. If UID's are unique and your peeps are inetOrgPersons
then this is the best filter for you.

If you do not have an index on uid I suggest you index it. But if you don't then the candidates
will be generated off the objectClass index which always exists since it is a system index.
The server will then iterate through the entire set of inetOrgPersons in your DIB and de-serialize
the entry from the master table then check (after normalizing the uid
attribute) if it is in fact equal to jsmith. This could be huge.

So index your uids and don't bother with the objectClass stuff if you don't vary the OC of
the people in your DIB.

Cheers,
Alex


>
> OR
>
> (&
>                (&(objectclass=top)
>                (objectclass=person)
> (objectclass= organizationalPerson)
> (objectclass=inetOrgPerson))
> (uid=jsmith)
> )
>
>
>


--
Best Regards,
-- Alex

Mime
View raw message