lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: constant-score rewrite mode for NumericRangeQuery
Date Sat, 18 Jul 2009 10:54:35 GMT
Hi Mike,

I did some perf tests with the well-known PerfTest.java from the
FieldCacheRangeFilter JIRA issue.

I compared a 5 mio doc index with precStep=4:

With constant score rewrite: 
avg number of terms: 68.3
TRIE: best time=6.192687 ms; worst time=463.0907 ms; avg=222.64312909999998
ms; sum=31994466

With boolean rewrite:
avg number of terms: 68.3
TRIE: best time=12.674237 ms; worst time=583.702957 ms; avg=257.912947 ms;
sum=31994466

Both numbers were taken after some warming up queries, the rand seed was
identical (so exactly same queries). It looks for this index size still
faster than Boolean rewrite. Especially the warmin queries take much longer
with Boolean rewrite. The problem with my test here is, that the whole index
seems to be in OS cache. If it is not in OS cache, I think the much longer
time, the first Boolean queries took, will get more important.

In my opinion, we should keep constant score enabled. My main problem with
Boolean rewrite is the completely useless scoring. A range query should
always have constant score. We could maybe fix this some time in future,
that you can disable scorers for Boolean queries (e.g.
bq.setDoConstantScore(true)). I think this is part of this special issue in
JIRA (do not know the number yet).

A second problem with Boolean rewrite: with precStep=4, it is guaranteed,
that the query will not hit the 1024 max clause problem (see formula with
the theoretical max term number) - so no problem at all. The problem starts,
if you combine 2 or three numeric queries combined by
BooleanClaus.Occur.MUST in a top-level Boolean query (the typical example of
a geo query). In this case, the Boolean queries that only consist of MUST
may be combined into one big one (correct me if I am wrong) and then the max
clause count gets a problem.

If we change the default, keep in mind to reopen SOLR-940, as it assumes to
have constant score mode per default and solr's default precStep is 8 ->
*bang*. Maybe the solr people should fix this and still explicitely set the
mode for all range queries.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Michael McCandless [mailto:lucene@mikemccandless.com]
> Sent: Friday, July 17, 2009 8:56 PM
> To: java-dev@lucene.apache.org
> Subject: constant-score rewrite mode for NumericRangeQuery
> 
> Should we really default to constant-score rewrite with NumericRangeQuery?
> 
> Would BooleanQuery rewrite mode give better performance on a large
> index, since the number of terms should be smallish w/ the default
> precisionStep (4), I think?
> 
> Mike
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message