lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: Rewrite for RegexpQuery
Date Tue, 12 Mar 2013 09:39:41 GMT
Hi Carsten,

I would suggest to use my example code with the fake query and custom rewrite. This does not
have the overhead of BooleanQuery and more important: You don't need to change the *global*
and *static* default in BooleanQuery. Otherwise you could introduce a denial of service case
into your application, if you at some other place execute a wildcard using Boolean rewrite
with unlimited number of terms.

The custom rewrite with the fake query to collect the terms was posted into another mail on
this thread.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Carsten Schnober [mailto:schnober@ids-mannheim.de]
> Sent: Tuesday, March 12, 2013 10:13 AM
> To: java-user@lucene.apache.org
> Subject: Re: Rewrite for RegexpQuery
> 
> Am 11.03.2013 18:22, schrieb Michael McCandless:
> > On Mon, Mar 11, 2013 at 9:32 AM, Carsten Schnober
> > <schnober@ids-mannheim.de> wrote:
> >> Am 11.03.2013 13:38, schrieb Michael McCandless:
> >>> On Mon, Mar 11, 2013 at 7:08 AM, Uwe Schindler <uwe@thetaphi.de>
> wrote:
> >>>
> >>>> Set the rewrite method to e.g. SCORING_BOOLEAN_QUERY_REWRITE,
> then this should work (after rewrite your query is a BooleanQuery, which
> supports extractTerms()).
> >>>
> >>> ... as long as you don't exceed the max number of terms allowed by
> >>> BQ
> >>> (1024 by default, but you can raise it).
> >>
> >> True, I've noticed this meanwhile. Are there any recommendations for
> >> this setting where the limit is as large as possible while staying
> >> within a reasonable performance? Of course, this is highly
> >> subjective, but what's the magnitude here? Will a limit of 1,024,000
> >> typically increase the query time by the factor 1,000 too?
> >> Carsten
> >
> > I think 1024 may already be too high ;)
> >
> > But really it depends on your situation: test different limits and see.
> >
> > How much slower a larger query is depends on the specifics of the terms ...
> 
> For the purpose of initial testing, I've increased the limit by the factor 1,000.
> As Uwe pointed out, I don't actually execute the query, but only extract the
> terms. In this regard, there are no performance issues with thousands of
> terms, although I will have to perform a systematic evaluation yet.
> Best,
> Carsten
> 
> 
> --
> Institut für Deutsche Sprache | http://www.ids-mannheim.de
> Projekt KorAP                 | http://korap.ids-mannheim.de
> Tel. +49-(0)621-43740789      | schnober@ids-mannheim.de
> Korpusanalyseplattform der nächsten Generation Next Generation Corpus
> Analysis Platform
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message