lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Carsten Schnober <schno...@ids-mannheim.de>
Subject Re: Rewrite for RegexpQuery
Date Tue, 12 Mar 2013 09:12:36 GMT
Am 11.03.2013 18:22, schrieb Michael McCandless:
> On Mon, Mar 11, 2013 at 9:32 AM, Carsten Schnober
> <schnober@ids-mannheim.de> wrote:
>> Am 11.03.2013 13:38, schrieb Michael McCandless:
>>> On Mon, Mar 11, 2013 at 7:08 AM, Uwe Schindler <uwe@thetaphi.de> wrote:
>>>
>>>> Set the rewrite method to e.g. SCORING_BOOLEAN_QUERY_REWRITE, then this should
work (after rewrite your query is a BooleanQuery, which supports extractTerms()).
>>>
>>> ... as long as you don't exceed the max number of terms allowed by BQ
>>> (1024 by default, but you can raise it).
>>
>> True, I've noticed this meanwhile. Are there any recommendations for
>> this setting where the limit is as large as possible while staying
>> within a reasonable performance? Of course, this is highly subjective,
>> but what's the magnitude here? Will a limit of 1,024,000 typically
>> increase the query time by the factor 1,000 too?
>> Carsten
> 
> I think 1024 may already be too high ;)
> 
> But really it depends on your situation: test different limits and see.
> 
> How much slower a larger query is depends on the specifics of the terms ...

For the purpose of initial testing, I've increased the limit by the
factor 1,000. As Uwe pointed out, I don't actually execute the query,
but only extract the terms. In this regard, there are no performance
issues with thousands of terms, although I will have to perform a
systematic evaluation yet.
Best,
Carsten


-- 
Institut für Deutsche Sprache | http://www.ids-mannheim.de
Projekt KorAP                 | http://korap.ids-mannheim.de
Tel. +49-(0)621-43740789      | schnober@ids-mannheim.de
Korpusanalyseplattform der nächsten Generation
Next Generation Corpus Analysis Platform

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message