lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: FuzzyQuery prefix length
Date Tue, 26 Oct 2004 19:33:53 GMT
Erik Hatcher wrote:
> On Oct 20, 2004, at 12:14 PM, Doug Cutting wrote:
> 
>> The advantages of a zero-character prefix default are that it's 
>> back-compatibile and that it will find more matches, when spelling 
>> differences are in the first characters.
>>
> 
> I prefer this default.
> 
> Anyone using QueryParser needs to be aware of the issues of exposing 
> fuzzy queries, range queries, and any other types the syntax supports.  
> It would not be Lucene's fault if a system with millions of documents is 
> exposed through QueryParser and fuzzy queries take a bit longer or 
> thrown a TooManyClauses exception.

I am clearly outvoted.  I still disagree, but will not veto this.

My last words on the topic (I promise!): In designing Lucene I tried 
hard to only add features that were scalable.  For example, one could 
easily implement a RegexQuery that scans text of stored fields, 
returning those which match a regex.  This would provide grep-like 
functionality, which some folks might find useful.  But it would not be 
scalable.  If someone contributed such a thing I would lobby against 
permitting its use from QueryParser in the default configuration.  The 
query parser already requires an initial character before a wildcard, in 
order to make this operator more scalable.  I don't see why fuzzy 
queries should be treated differently, why we permit such a huge 
scalability hole in the default configuration.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message