lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Terry Steichen" <te...@net-frame.com>
Subject Re: SubstringQuery -- Re: Leading Wild Card Search
Date Tue, 17 Feb 2004 20:06:33 GMT
Doug,

What you say makes a good deal of sense to me.  Could you give us a relative
sense of the "slowness" of different operators?

Regards

Terry

----- Original Message -----
From: "Doug Cutting" <cutting@apache.org>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Tuesday, February 17, 2004 1:16 PM
Subject: Re: SubstringQuery -- Re: Leading Wild Card Search


> David Spencer wrote:
> > 2 files attached, SubstringQuery (which you'll use) and
> > SubstringTermEnum ( used by the former to be
> > consistent w/ other Query code).
> >
> > I find this kind of query useful to have and think that the query parser
> > should allow it in spite of the perception
> > of this being slow, however I think the debate is the "user centric
> > view" (say mine, allow substring queries)
> > vs the "protect the engines performance" view which says not to allow
> > expensive queries.
>
> I think the argument is more complex.
>
> One issue is cost of execution: very slow queries can be used to
> implement a denial-of-service attack.  Maybe that's an overstatement,
> but in a web server setting, once a few of slow searches are running, no
> others may complete.  When folks hit "Stop" in their browser the server
> does not stop processing the query.  If they hit "Reload" then another
> new search is started.  So these can be very problematic.  This is real.
>   Lots of folks have deployed Lucene with large indexes and then found
> that their server randomly crashes.  Closer scrutiny shows that they
> were permitting operators that are too slow for their combination of
> index size and query traffic.  The BooleanQuery.TooManyClauses exception
> was added to address this, but it can still be too late, if the problem
> is caused before the query is built, e.g., while enumerating all terms.
>
> A releated issue is that users (and even most developers) don't
> understand the relative costs of different query operators.  Some things
> are fast, others are surprisingly slow.  That's not a great user
> experience, and triggers problems like those described above.  People
> think that the rare slow cases are network problems or something, and
> hit "Reload".
>
> I have no problem with including slow operators with Lucene, but they
> should be well documented as such, at least for developers.  Perhaps we
> should make a pass through the existing Query classes, in particular
> those which expand into other queries, and add some performance notes,
> so that folks don't blindly start using things which may bite them.  By
> default I think it would be safest if the QueryParser only permitted
> operators which are efficient.  Folks can then, at their own risk,
> enable other operators.
>
> In summary, removing operators can be user-centric, if it removes
> unpredictablity.  And the reason for protecting engine performance is
> not miserly, it's to guarantee availablility.  And finally, an issue
> dear to me, a predicatble search engine results in fewer spurious bug
> reports, saving developer time for real bugs.
>
> Doug
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message