lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christoph Goller <>
Subject Re: QueryParser and backwards-compatibility
Date Mon, 11 Oct 2004 10:25:40 GMT
Erik Hatcher schrieb:
> On Oct 11, 2004, at 4:31 AM, Christoph Goller wrote:
>> It seems that I did not think enough about the changes in QueryParser.
>> They definitely break the API. Sorry for doing this a little bit too
>> hastily. The following changes in QueryParser break the API:
>> 1) Analyzer argument in both getFieldQuery methods
>> 2) Analyzer argument in getRangeQuery
>> 3) Additional minSimilarity argument in getFuzzyQuery
> For 1-3, as long as the old signature was added back (and possibly 
> deprecated), there is no problem keeping the new signature.

Ok, I will do this in the head and in branch 1.4.2.

>> 4) Default minimum similarity in query parser
>> 5) FuzzyQuery.toString which also contains minSimilarity
>> These things clearly could break existing applications. So the best
>> solution would be to undo them. I am not sure whether we should undo
>> 4 and 5, since they make fuzzy queries a little bit more usable and
>> QueryParser is able to read the new FuzzyQuery.toString. But other
>> applications may not ....
> You sent a follow-up e-mail about moving the default value back to 
> zero.  That seems fine to me.  It's no big deal about 
> FuzzyQuery.toString - I doubt folks are relying on its output to parse 
> again, so you can leave that behavior as is.  I just happened to have a 
> test case that relied on it, but no production code.

This mail was about the prefix length for FuzzyQuery which has a very
strong impact on the performance and on the semantics. In QueryParser
1.4.2 we only have a default value for minSimilarity, prefixLength is
not touched in QueryParser 1.4.2 and thus implicitly defaults to 0.

I think minSimilarity (and its default value in QueryParser) was mainly
introduce in order to avoid TooManyClausesExceptions, since it reduces
the number of terms going into the rewritten BooleanQuery. Furthermore,
I think the default value should remain 0.5 even for 1.4.2: Terms with a
Levenstein distance greater than have of their lenght (that's the meaning
of minSimilarity == 0.5f) do not seem similar to me.

>> Since 1.4.2 is already out, we would have to make a version 1.4.3.

OK, one more vote needed :-)
Maybe we should wait what Doug says.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message