lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hayden Muhl <haydenm...@gmail.com>
Subject Re: Performance issues with facets and filter query exclusions
Date Sat, 19 Jul 2014 02:37:51 GMT
That query is representative of some of the queries in my test, but I
didn't notice any correlation between using the match all docs query and
poor query performance. Here's another example of a query that took longer
than expected.

qt=en&q=dress green
leather&fq=userId:(2222383)&fq={!tag=productRetailerId}productRetailerId:(83
644)&fq={!tag=productCanonicalColorId}productCanonicalColorId:(16 7
13)&facet.field={!ex=productRetailerId}productRetailerId&facet=true&facet.mincount=1&facet.limit=100

This query took over five seconds. Here I'm just doing one facet on the
field "productRetailerId". For the actual search results, Solr will have to
do an intersection of four queries: "dress green leather", "userId:(...)"
,"productRetailerId:(...)" and "productCanonicalColorId:(...)". For the
facet, it will have to compute an intersection on the same queries
excluding the "productRetailerId:(...)" query.

To your point about the match all docs query, there are plenty of examples
which ran quickly with a match all docs query. I've put together a Google
spreadsheet with some of my test results.

https://docs.google.com/spreadsheets/d/149k6_CM6JuGMbqhZIfiJetTxDxXcWdKeGU6FomjwO9Y/edit?usp=sharing

I ran another test with some simplified facet queries. In these examples, I
only did one facet at a time, and never faceted on a field I was running a
filter query on. These are examples of queries I would run to get the same
functionality as filter query exclusion.

https://docs.google.com/spreadsheets/d/1xzS2sbb6btyvydD6Q5X8ecD82DE92Pls-DbK2nwdTvc/edit?usp=sharing

Most of these queries run in under 100 ms, but even the slowest tend to be
under 500 ms. I can reproduce the functionality of the five second query at
the beginning of this email by running two of these simplified queries.

There are examples in my first spreadsheet where a filter exclusion is
happening and the query performs just fine. However, it seems that all slow
queries have a filter exclusion, and no queries without a filter exclusion
have query times longer than a second.

For reference, all these tests were done on a non-optimized core with about
80 million records, and no indexing happening. Each of the spreadsheets
represents performance on a warmed core. I warmed the core by running the
test for about a minute before gathering this data. The spreadsheets are
output from Solr Meter. I can post logs if that's easier to look at.

- Hayden


On Fri, Jul 18, 2014 at 11:48 AM, Yonik Seeley <yonik@heliosearch.com>
wrote:

> On Fri, Jul 18, 2014 at 2:10 PM, Hayden Muhl <haydenmuhl@gmail.com> wrote:
> > I was doing some performance testing on facet queries and I noticed
> > something odd. Most queries tended to be under 500 ms, but every so often
> > the query time jumped to something like 5000 ms.
> >
> > q=*:*&fq={!tag=productBrandId}productBrandId:(156
> > 1227)&facet.field={!ex=productBrandId}productBrandId&facet=true
> >
> > I noticed that the drop in performance happened any time I had a filter
> > query tag match up with a facet exclusion.
>
> Is this an actual query that took a long time, or just an example?
> My guess is that "q" is actually much more expensive.
>
> If a filter is excluded, the base DocSet for faceting must be re-computed.
> This involves intersecting all the DocSets for the other filters not
> excluded (which should all be cached) with the DocSet of the query
> (which won't be cached and will need to be generated).  That last step
> can be expensive, depending on the query.
>
> -Yonik
> http://heliosearch.org - native code faceting, facet functions,
> sub-facets, off-heap data
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message