lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bernhard Messer <Bernhard.Mes...@intrafind.de>
Subject Re: FuzzyQuery prefix length
Date Wed, 20 Oct 2004 17:45:59 GMT
Doug Cutting wrote:

> Daniel Naber wrote:
>
>> On Tuesday 12 October 2004 17:22, Doug Cutting wrote:
>>
>>> Which is worse: a person who searches for Photokopie~ in a 1000 
>>> document
>>> collection does not find documents containing Fotokopie; or a person 
>>> who
>>> searches for Photokopie~ in a 1M document collection doesn't find
>>> anything because it takes too long.  I think some relevant results are
>>> better than none.
>>
>>
>> I disagree, as the user who doesn't get the "Fotokopie" matches will 
>> not understand what's going on. He will assume that there are no such 
>> documents, which is wrong. If there's a timeout the user will at 
>> least notice something is wrong. Besides that, it's the developers 
>> responsibility to get things fast enough. If he decides to do so with 
>> a prefix that might be okay for his use case. 
>
my personal opinion, plus the experience I've made over the last years 
in the area of information retrieval would favorite Daniel's idea to set 
the prefix length to 0 per default. My personal arguments are:

1) most of the developers using lucene, either as a basis or as an 
enhancement on their own products, will deal with an index size not 
bigger than 10.000 documents. These group of developers are happy if 
they have an API which is easy to use and does exactly what they expect. 
They don't worry about internal features and just use it, the way they 
got it. With such an index size, they will never run into a timeout or 
performance problem and they're happy to find all documents belonging to 
a fuzzy query.

2) developers handling large document collection with more than 1M docs 
will study the possibilities and options they have within lucene to 
optimize their system. They will find the knob which has to be screwed 
when running into timeouts  or memory problems. If not, they will ask 
the community to get an hint.

3) I would leave the functional behavior of lucene in future versions 
backward compatible as far as possible. It's no problem to change the 
API, making methods deprecated and and... Modern development 
environments are showing up the deprecation warnings, supporting 
developers to update the software.  But they can't support us, if the 
query results are different changing from lucene 1.4 to lucene 1.9.

Bernhard





> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message