lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg Bowyer <>
Subject Re: Strange bug when we enable faceting
Date Thu, 03 Nov 2011 00:09:36 GMT
Ignore this !!

I discovered through testing and code review today just what things the 
filter cache is used for and why my previous thinking was wrong, I had 
the cache set too large to accomodate all of the other things the filter 
cache stores.

On 02/11/11 11:17, Greg Bowyer wrote:
> When I enable faceting in SOLR for some reason our incoming user queries
> start becoming cached in the filter cache, this very quickly leads the
> instance to run out of memory; we could lower the size of the
> filtercache, but I feel this is a band-aid around a far odder problem.
> I have been investigating the heap-dumps that were created on our
> instances when we ran out of memory, these dumps show (unless yourkit is
> being dishonest) that the filter-cache contains
> BoostedQueries(BooleanQueries(DisjunctionMaxQueries))) objects, each of
> which contains terms objects that I would not expect to see in the
> filterCache.
> A snapshot of the object graph can be seen here.
> In terms of our index, queries and setup; have a solr 3.3 setup with
> sharding, we have nodes that act as aggregators with the rest acting as
> slaves or shards. As per recommendations, the aggregators act as
> dispatchers for searches, but do not themselves surface any index data.
> Most of our search queries differ on the search terms but generally have
> the following form:
>       path=/aggregator/
> params={fl=docid,pid,score&start=0&q=dat+data+cartridge&fq=+parent_cids:438&fq=+dtype:(1+OR+2)&rows=20
>       path=/select
> params={fl=docid,score&start=0&q=polyethylene+bench+storage&enable=true&isShard=true&wt=javabin&fq=+rev_type:[1+TO+2]&fq=+parent_cids:25000500&fq=+dtype:(1+OR+2)&fsv=true&rows=20&version=2
> Breaking this down, the fqs defined are against three fields:
>       * parent_cids - This field contains roughly 1394 terms, there are a
> few
>                       permutations for this field, but I would expect no
> more than
>                       at most ~10000 fqs for this field
>       * dtype - This field has 2 terms, and we only ever query it as
> shown above,
>                 its reserved for some future work and would at most only
> ever have
>                 8 terms
>       * rev_type - Similer to dtype, we only have 3 terms in this field
> All of our filters are not generally user accessible, and we ensure that
> clients alway provide filter queries in the same order to remove the
> duplication of fq's (that is, we go to some length to avoid things like
> fq=+dtype(2+OR+1) appearing since we already cache fq=+dtype(1+OR+2)).
> Our search handler is defined with some basic parameters as follows
> ---- %<  ----
> <requestHandler name="search" class="solr.SearchHandler" default="true">
> <!-- default values for query parameters can be specified, these
>       will be overridden by parameters in the request
>      -->
> <lst name="defaults">
> <str name="echoParams">explicit</str>
> <str name="qf">title^1.0 descr^0.5 mft^0.5 brand^0.5</str>
> <str name="pf">title^3 descr^0.5</str>
> <str name="boost">product(redir,bid)</str>
> <str name="ps">4</str>
> <str name="mm">50%</str>
> <str name="defType">edismax</str>
> <int name="rows">20</int>
> <str name="facet">true</str>
> <str name="facet.field">price_bucket</str>
> <str name="facet.price_bucket.sort">count</str>
> <str name="facet.price_bucket.mincount">1</str>
> <str name="facet.price_bucket.limit">100</str>
> <str name="facet.mincount">1</str>
> </lst>
> </requestHandler>
> ---->% ----
> price_bucket is a field that we deduce at index time, it takes a field
> we store called price and creates a term that reflects a range (or
> bucket) of prices that the given document falls into. I did originally
> attempt to use facet counts directly but found that the instance failed
> due to running out of memory; at the time it was assumed that our range
> of prices and the granularity of our "buckets" were creating too many
> filter queries. for reference there are 239 unique terms in the
> price_bucket field.
> At present our installation, indexing practices and queries are very
> vanilla, we are doing nothing esoteric out of the box.
> This is a fairly undesirable issue as it means that our filter-cache
> rapidly fills rapidly, with cache items that are unlikely to ever be
> required again.
> Does anyone have any ideas on what could be causing this?
> -- Greg Bowyer
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message