lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandre Rafalovitch <arafa...@gmail.com>
Subject Re: Cached fq decreases performance
Date Fri, 04 Sep 2015 14:18:13 GMT
Yonik,

Is this all visible on query debug level? Would it be effective to ask
to run both queries with debug enabled and to share the expanded query
value? Would that show up the differences between Lucene
implementations you described?

(Looking for troubleshooting tips to reuse).

Regards,
   Alex.
----
Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 4 September 2015 at 10:06, Yonik Seeley <yseeley@gmail.com> wrote:
> On Thu, Sep 3, 2015 at 4:45 PM, Jeff Wartes <jwartes@whitepages.com> wrote:
>>
>> I have a query like:
>>
>> q=<some complicated stuff>&fq=enabled:true
>>
>> For purposes of this conversation, "fq=enabled:true" is set for every query, I never
open a new searcher, and this is the only fq I ever use, so the filter cache size is 1, and
the hit ratio is 1.
>> The fq=enabled:true clause matches about 15% of my documents. I have some 20M documents
per shard, in a 5.3 solrcloud cluster.
>>
>> Under these circumstances, this alternate version of the query averages about 1/3
faster, consumes less CPU, and generates less garbage:
>>
>> q=<some complicated stuff> +enabled:true
>>
>> So it appears I have a case where using the cached fq result is more expensive than
just putting the same restriction in the query.
>> Does someone have a clear mental model of how “q” and “fq” interact?
>
> Lucene seems to always be changing it's execution model, so it can be
> difficult to keep up.  What version of Solr are you using?
> Lucene also changed how filters work,  so now, a filter is
> incorporated with the query like so:
>
> query = new BooleanQuery.Builder()
>     .add(query, Occur.MUST)
>     .add(pf.filter, Occur.FILTER)
>     .build();
>
> It may be that term queries are no longer worth caching... if this is
> the case, we could automatically not cache them.
>
> It also may be the structure of the query that is making the
> difference.  Solr is creating
>
> (complicated stuff) +(filter(enabled:true))
>
> If you added +enabled:true directly to an existing boolean query, that
> may be more efficient for lucene to process (flatter structure).
>
> If you haven't already, could you try putting parens around your
> (complicated stuff) to see if it makes any difference?
>
> -Yonik

Mime
View raw message