directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Quanah Gibson-Mount <qua...@stanford.edu>
Subject Re: Various questions
Date Mon, 05 Jun 2006 17:28:28 GMT


--On Monday, June 05, 2006 5:25 PM -0400 Alex Karasulu 
<aok123@bellsouth.net> wrote:

> Quanah Gibson-Mount wrote:
>
>>
>>
>> --On Monday, June 05, 2006 2:54 PM -0400 Alex Karasulu
>> <aok123@bellsouth.net> wrote:
>>
>>>> I assume it should also handle approx, which is not the same as
>>>> substring.
>>>
>>>
>>> No as I mentioned before ApacheDS does not do approximate matching and
>>> so it does not have an option to create approx indices.
>>
>>
>> Right, my point was, I'd assume that should be added, so that it could
>> be supported....
>>
> I don't find the approx matching algorithms based on soundex etc to be
> all that useful.  Plus the indices get bloated and the server's write
> performance diminishes much faster.  Approx indices must generate all
> the varients of a word using these algorithms which can be large.  Every
> add, del or modify operation then must regenerate these soundex
> derivatives for the old value as well as new values in the modify op.
> Keep in mind also some attributes will be multivalued so the explosion
> can be quit large.
>
> IMO approx match is one of those things that was a good idea but is not
> critical or used all that much.  If we find the time or if you're
> interested you can implement this feature.  For now no index is created
> for approx matching.


I think the concept of applying all indexing to attributes is in itself 
broken.  As someone who has been running Stanford's directory service for 7 
years, we have reasons as to why we index particular attributes the way we 
do.  It is in part sometimes to limit the feasability of doing some 
searches (leaving substr off of some attributes, for example).

In addition, soundex is quite useful for white page lookups, when someone 
knows a last name by sound, but not spelling.


In any case, the choice is obviously yours, but I think the thinking so far 
is flawed.


--Quanah


--
Quanah Gibson-Mount
Principal Software Developer
ITS/Shared Application Services
Stanford University
GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html

Mime
View raw message