directory-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Karasulu <akaras...@apache.org>
Subject Re: Is it faster/better to include one objectclass or all in query?
Date Wed, 14 Mar 2012 16:08:16 GMT
On Wed, Mar 14, 2012 at 4:51 PM, <Carlo.Accorsi@ibs-ag.com> wrote:

> Hi, when searching for a user having this objectclass hierarchy
>
> top
>  |_person
>         |_organizationalPerson
>                    |_inetOrgPerson
>
> and uid = 'jsmith'
>
> Which query would be less expensive or better/faster?  Thanks!
>
> (&
>   (objectclass=inetOrgPerson)
>   (uid=jsmith)
> )
>

This would be faster and more efficient since the evaluation is on a more
specific objectClass which reduces the search space from the get go.

To understand this you need to know about how the optimizer works with scan
counts that are returned. LDAP search filters are expanded out into an AST
(abstract syntax tree) with the leaves of the tree being assertions the
branch nodes being operators. Then the optimizer annotates this AST with
scan counts, which basically is asking each index, "Hey how many results
would you return for this assertion?" So the more specific inetOrgPerson is
more likely to return a smaller scan count.

Now if you have an index on uid then the scan count on this will be 1 since
UID should be unique (our DSA does not enforce this tho). Once the
optimizer is done annotating, then a leaf node is selected in the entire
AST to act as the candidate generator and is used for iterations. The leaf
node with the smallest scan count is selected for this. The driving reason
for this is that it is cheaper to iterate and lookup on less than it is
more candidates. The rest of the leaf assertion nodes are used by lookup
based assertion evaluators. So in this case with a uid index you will use
this uid=jsmith to return one candidate and then do a lookup to see if the
returned candidates are also matched by objectClass=inetOrgPerson. In this
case I would just use (uid=jsmith) since you have the uid index. It will
prevent the need for another lookup to check if it's an inetOrgPerson. If
UID's are unique and your peeps are inetOrgPersons then this is the best
filter for you.

If you do not have an index on uid I suggest you index it. But if you don't
then the candidates will be generated off the objectClass index which
always exists since it is a system index. The server will then iterate
through the entire set of inetOrgPersons in your DIB and de-serialize the
entry from the master table then check (after normalizing the uid
attribute) if it is in fact equal to jsmith. This could be huge.

So index your uids and don't bother with the objectClass stuff if you don't
vary the OC of the people in your DIB.

Cheers,
Alex


>
> OR
>
> (&
>                (&(objectclass=top)
>                (objectclass=person)
> (objectclass= organizationalPerson)
> (objectclass=inetOrgPerson))
> (uid=jsmith)
> )
>
>
>


-- 
Best Regards,
-- Alex

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message