lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <>
Subject Re: Solr cache for specific field
Date Tue, 18 Aug 2015 15:36:35 GMT
On 8/18/2015 7:21 AM, Norgorn wrote:
> SOLR version - 4.10.3
> We have SOLR Cloud cluster, each node has documents only for several
> categories.
> Queries look like "...fq=cat(1 3 89 ...)&..."
> So, only some nodes need to process, others can answer with zero as soon as
> they check "cat".
> The problem is to keep separate cache for "cat" values on each node.
> As I understand, custom caches are available only for custom request
> handlers, but we are happy with default SearchHandler.

I'm curious why you need to make any changes at all.  Unless the number
of unique values in the cat field is extremely large, a query for a
nonexistent term will normally be extremely fast.

In the example you provided in a later message on this thread, you would
save 200 milliseconds on the entire query,so the 1300 milliseconds of
the next longest query would dominate your query time.  Although the
percentage is significant, this barely registers in human time
perception.  Based on the numbers you provided, which are fairly similar
for all nodes whether there are matches or not, I am thinking that this
field does NOT have a huge number of unique values.

I think that a qtime of over one second per node for a simple search on
a category field indicates a major performance problem.  For comparison
purposes on one of my own indexes (not SolrCloud, but still
distributed), I do a query of "ip:get", and I see a QTime of 552
milliseconds.  Subsequent cached queries for the same information happen
in about 3 milliseconds.

This query matches over 100 million docs -- 104073614, which is nearly
half of the 224214642 docs in the entire index.  The whole index (split
between seven shards on two machines) takes up over 250GB of disk
space.  It is not a small index.  There are 35 unique values in the ip
field.  I do not have enough memory on these servers for optimal
performance ... if I could put 128GB or more of RAM on each server
instead of the 64GB that's there now, my query time likely be even faster.


View raw message