lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@lucene.com>
Subject Re: sloppyFreq - why on Similarity?
Date Mon, 29 Sep 2003 20:56:51 GMT
Erik Hatcher wrote:
>> I think this is historical.  Most of the other query classes are ones 
>> I've implemented, and I've added relevant methods to Similarity for 
>> them.  WildcardQuery and FuzzyQuery were contributed.  I've never used 
>> them in an application, as I think they're potential performance 
>> pitfalls, so I've probably ignored them when maintaining Similarity.
> 
> 
> Yeah, I'm digging deep into these implementations and see the 
> performance pitfall too.  Its kind of scary, actually.  With a wide open 
> QueryParser, it could be a potential DoS attack to force a ton of fuzzy 
> or wildcard queries through.  It's almost as if we should force the 
> enabling of those features in QueryParser and have it off by default.  
> Just an idea.

Note that BooleanQuery.maxClauseCount was added to keep these sorts of 
queries from blowing things up too much.  But, still, if every query 
contains a generous wildcard, or was fuzzy at all, things can quickly 
grind to a halt.  This mainly addresses out-of-memory issues.  A fuzzy 
query, even if it doesn't match many terms, can still be very slow, 
consuming large amounts of i/o and CPU.

As a start, we could add parameters to the query parser to disable these 
features.  If we disable them by default then lots of people will howl: 
they're popular features.  Perhaps in a future major release they could 
be disabled by default, but, for now, I think it would at least be good 
to be able to disable them.

Doug


Mime
View raw message