lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Klaas <mike.kl...@gmail.com>
Subject Re: Performance stats for indeces with over 10MM documents
Date Thu, 03 Jan 2008 05:34:10 GMT

On 2-Jan-08, at 5:47 PM, Alex Benjamen wrote:
>
> gender:m AND status:(2 || 8 || 6 || 3) AND age:(26 || 27 || 28 ||  
> 29) AND orientation:3
> gender:f AND age:(27 || 28 || 29 || 30 || 31 || 32 || 33 || 34 ||  
> 35 || 36 ) AND orientation:2 AND photos:y
> gender:f AND (activity:y) AND age:(28 || 29 || 30 || 31 || 32 ||  
> 33 ) AND orientation:2

I think it is the nature of your queries.  Lucene/Solr is optimized  
for full-text search, not rather complicated boolean constraint queries.

The first step:
  - set q.alt = *:* as a default parameter
  - convert all disjunctions to separate fq parameters
  - use range queries when querying contiguous disjunctions (age:[28  
TO 33] rather than what you have above).

This should:
  - allow your filters to cache separately, improving reuse
  - make queries that are combinations of previously-cached filters  
be nothing but a few bitset intersections
  - convert the expensive, heap-based age filter disjunction into a  
bitset created directly from the term enum

There is a lot of lucenese in what I just wrote, but the gist is that  
your queries should be substantially faster.

If that isn't sufficient, it is likely that a lot of performance  
could be gained by creating a means of caching filter disjuncts which  
would then be combined (essentially, each age would be a separate  
bitset and the union would be taken at query time).  These kinds of  
things don't arise all that often in the Solr world, though, so there  
isn't built-in capability for this.  It wouldn't be atrocious to  
implement, though.

-Mike


Mime
View raw message