lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Filter vs. TermQuery performance
Date Sun, 09 May 2010 16:24:09 GMT
I think others will have more thoughts on this, esp. for Numeric* questions... but I'll try
answering...
 

----- Original Message ----
> From: Tomislav Poljak <tpoljak@gmail.com>
> To: java-user@lucene.apache.org
> Sent: Fri, May 7, 2010 2:34:46 PM
> Subject: Filter vs. TermQuery performance
> 
> Hi,
> when is it wise to replace a TermQuery with cached Filter 
> (regarding search performance). If TermQuery is used only to filter results 
> based on field value (it doesn't participate in scoring), is it alway wise 
> to replace it with filter? 

Yes, assuming the filter will be reused.  I think there is not a lot of value in using a filter
(vs. just a regular query) if that filter will not be reused.  This is why in Solr "fq"s (filtered
queries) are cached in a special filter cache.  I *think* the only other benefit of using
a filter query vs., say, TermQuery, is that the former will not spend any time/CPU on computing
the score for the filter part.

> Is it only wise if Filter is cached (wrapped in CachingWrapperFilter) and reused often?

I think so.  See above.

> Does it matter how many 
> distinct values field has (which is related to how many matches/results for 
> one given/selected value is returned and also with how many times same filter 
> instance is reused)?

I *think* it matters.  I think the more docs a filter matches, the higher the benefit from
reusing a filter.

> For example, what if filter for single value matches 
> only 5% of docs, should filter be used or is it better to use TermQuery? 
> What about if filter for single value matches 20%? or 50% or 
> 75%

I'm not sure...

> I have a question regarding caching performance/memory usage. 
> Documents have date&time indexed (as NumericField) with minute resolution 
> and there are few thousands unique date&time in index. On the search 
> side open ended range filter is used (NumericRangeFilter) with current 
> time as a parameter.

> Now, is it wise to cache NumericRangeFilter here 
> (reuse instance of CachingWrapperFilter wrapping NumericRangeFilter) since it 
> will not be reused often (only from users searching at same time in same time 
> zone)?

If the cache hit rate is low, why waste memory on caching is what I would think is the logic
to apply here.
If you have 3 queries, and each uses a different date range query, then you will not see benefits
from caching..
If 2 of those 3 queries use the exact same date range query, then you will see caching benefits.

> Is it better to use NumericRangeFilter or NumericRangeQuery in this case?

I'm not sure, but I'd be happy to add specific advice to Javadoc when the answer is clear.

Otis

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message