lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <>
Subject Re: fq efficiency
Date Wed, 06 Nov 2013 00:34:51 GMT
On 11/5/2013 3:36 PM, Scott Schneider wrote:
> I'm wondering if filter queries are efficient enough for my use cases.  I have lots and
lots of users in a big, multi-tenant, sharded index.  To run a search, I can use an fq on
the user id and pass in the search terms.  Does this scale well with the # users?  I suppose
that, since user id is indexed, generating the filter data (which is cached) will be fast.
 And looking up search terms is fast, of course.  But if the search term is a common one that
many users have in their documents, then Solr may have to perform an intersection between
two large sets:  docs from all users with the search term and all of the current user's docs.
> Also, how about auto-complete and searching with a trailing wildcard?  As I understand
it, these work well in a single-tenant index because keywords are sorted in the index, so
it's easy to get all the search terms that match "foo*".  In a multi-tenant index, all users'
keywords are stored together.  So if Lucene were to look at all the keywords from "foo" to
"foozzzzz" (I'm not sure if it actually does this), it would skip over a large majority of
keywords that don't belong to this user.

 From what I understand, there's not really a whole lot of difference 
between queries and filter queries when they are NOT cached, except that 
the main query and the filter queries are executed in parallel, which 
can save time.

When filter queries are found in the filterCache, it's a different 
story.  They get applied *before* the main query, which means that the 
main query won't have to work as hard.  The filterCache stores 
information about which documents in the entire index match the filter.  
By storing it as a bitset, the amount of space required is relatively 
low.  Applying filterCache results is very efficient.

There are also advanced techniques, like assigning a cost to each filter 
and creating postfilters:


View raw message