lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <>
Subject Re: FuzzyQuery prefix length
Date Tue, 12 Oct 2004 15:22:55 GMT
Daniel Naber wrote:
> -It is the only change so far that we cannot express in the API, i.e. we 
> cannot just deprecate a method to make Lucene's users aware of this. So we 
> can only list it in CHANGES.txt, where some people will surely miss it.

We could define a new query parser class with the new behaviour and 
deprecate the old query parser.  I am not advocating this, merely noting 
that it is possible to make this change back-compatibly.

If we agree that this change does make Lucene better (and I'm not sure 
we do) then we should make the change, no?  Back-compatibility is a good 
thing, but, with a major release, should quality suffer becaue of 
back-compatibility issues?  I hope not.  Rather we should take the 
opportunity of a major release to make Lucene as good as we can.

> -There are words in German like Photokopie/Fotokopie which have the same 
> meaning and a very similar spelling, so people will expect a FuzzyQuery to 
> match such words. But as the difference is in the first two characters it 
> won't be found with the default.
> -People whose index is just 1000 documents large will probably not notice a 
> difference in speed, but they might see a difference in quality (see 
> above). Why should these people change the default instead of those with a 
> 10 mio document index?

Which is worse: a person who searches for Photokopie~ in a 1000 document 
collection does not find documents containing Fotokopie; or a person who 
searches for Photokopie~ in a 1M document collection doesn't find 
anything because it takes too long.  I think some relevant results are 
better than none.  Classes of queries which take orders of magnitude 
longer than others are a problem.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message