lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonik Seeley <ysee...@gmail.com>
Subject Re: Cached fq decreases performance
Date Fri, 04 Sep 2015 14:06:26 GMT
On Thu, Sep 3, 2015 at 4:45 PM, Jeff Wartes <jwartes@whitepages.com> wrote:
>
> I have a query like:
>
> q=<some complicated stuff>&fq=enabled:true
>
> For purposes of this conversation, "fq=enabled:true" is set for every query, I never
open a new searcher, and this is the only fq I ever use, so the filter cache size is 1, and
the hit ratio is 1.
> The fq=enabled:true clause matches about 15% of my documents. I have some 20M documents
per shard, in a 5.3 solrcloud cluster.
>
> Under these circumstances, this alternate version of the query averages about 1/3 faster,
consumes less CPU, and generates less garbage:
>
> q=<some complicated stuff> +enabled:true
>
> So it appears I have a case where using the cached fq result is more expensive than just
putting the same restriction in the query.
> Does someone have a clear mental model of how “q” and “fq” interact?

Lucene seems to always be changing it's execution model, so it can be
difficult to keep up.  What version of Solr are you using?
Lucene also changed how filters work,  so now, a filter is
incorporated with the query like so:

query = new BooleanQuery.Builder()
    .add(query, Occur.MUST)
    .add(pf.filter, Occur.FILTER)
    .build();

It may be that term queries are no longer worth caching... if this is
the case, we could automatically not cache them.

It also may be the structure of the query that is making the
difference.  Solr is creating

(complicated stuff) +(filter(enabled:true))

If you added +enabled:true directly to an existing boolean query, that
may be more efficient for lucene to process (flatter structure).

If you haven't already, could you try putting parens around your
(complicated stuff) to see if it makes any difference?

-Yonik

Mime
View raw message