lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <>
Subject Re: FuzzyQuery prefix length
Date Tue, 26 Oct 2004 20:29:19 GMT
On Oct 26, 2004, at 3:33 PM, Doug Cutting wrote:
> Erik Hatcher wrote:
>> On Oct 20, 2004, at 12:14 PM, Doug Cutting wrote:
>>> The advantages of a zero-character prefix default are that it's 
>>> back-compatibile and that it will find more matches, when spelling 
>>> differences are in the first characters.
>> I prefer this default.
>> Anyone using QueryParser needs to be aware of the issues of exposing 
>> fuzzy queries, range queries, and any other types the syntax 
>> supports.  It would not be Lucene's fault if a system with millions 
>> of documents is exposed through QueryParser and fuzzy queries take a 
>> bit longer or thrown a TooManyClauses exception.
> I am clearly outvoted.  I still disagree, but will not veto this.
> My last words on the topic (I promise!): In designing Lucene I tried 
> hard to only add features that were scalable.  For example, one could 
> easily implement a RegexQuery that scans text of stored fields, 
> returning those which match a regex.  This would provide grep-like 
> functionality, which some folks might find useful.  But it would not 
> be scalable.  If someone contributed such a thing I would lobby 
> against permitting its use from QueryParser in the default 
> configuration.  The query parser already requires an initial character 
> before a wildcard, in order to make this operator more scalable.  I 
> don't see why fuzzy queries should be treated differently, why we 
> permit such a huge scalability hole in the default configuration.

I agree completely with your sentiment.  I personally would be happy 
with QueryParser weren't part of Lucene altogether - sure did make 
writing the book much harder, thats for sure!

But with the wildcard query requiring an initial character - at least 
the results you get back would be completely accurate.  With a fuzzy 
query and a required prefix, it would not necessarily be the case, 
given the examples I've seen on here.

Perhaps for Lucene 2.0 we can gut QueryParser and have some type of 
pluggable syntax handlers, so that these inefficient queries like fuzzy 
and wildcard are not initially possible, but could be turned on 
somehow.  I personally recommend, and show how in the book, to throw 
ParseException for both wildcard and fuzzy queries.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message