lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <>
Subject Re: sloppyFreq - why on Similarity?
Date Mon, 29 Sep 2003 11:20:53 GMT
On Wednesday, September 24, 2003, at 02:31  PM, Doug Cutting wrote:
> Erik Hatcher wrote:
>> WildcardQuery and FuzzyQuery do have the capability of affecting the 
>> scoring, although only FuzzyQuery seems to take advantage of this.  
>> There is no way on a Similarity implementation to affect the factors 
>> applied by these queries though.  So there is some inconsistency on 
>> these types of things.
> I think this is historical.  Most of the other query classes are ones 
> I've implemented, and I've added relevant methods to Similarity for 
> them.  WildcardQuery and FuzzyQuery were contributed.  I've never used 
> them in an application, as I think they're potential performance 
> pitfalls, so I've probably ignored them when maintaining Similarity.

Yeah, I'm digging deep into these implementations and see the 
performance pitfall too.  Its kind of scary, actually.  With a wide 
open QueryParser, it could be a potential DoS attack to force a ton of 
fuzzy or wildcard queries through.  It's almost as if we should force 
the enabling of those features in QueryParser and have it off by 
default.  Just an idea.

>   It is also a judgement call as to when something should be specified 
> per-query (e.g., boost, phrase slop, etc.) and when it is a policy to 
> be set for all queries (IDF computation, document length 
> normalization, etc.).

Hmmm..... more to think about for me.

>> So now we need a getFuzzyFreq and getWildcardFreq?!  :)
> I think you're being sarcastic, but, to be consistent, yes, if we 
> believe these have parameters that are more about ranking policy than 
> are query-specific.

Yes, I was being sarcastic.  I think for fun I'll try my hand out at 
creating a WildcardQuery variant that scores by closeness.

> The idea is to centrally locate the ranking policy.  I guess you could 
> alternately make these all methods on various query classes.  But if, 
> e.g. idf() were a method on TermQuery, it would make construction of 
> generic query parsers more difficult.
> Do you have another design to propose?

No, not at all.  I'm just thinking out loud as I learn more and more 
about the internals.  Way cool stuff, and the design is superb.


View raw message