lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaronireland <>
Subject Solr filterCache and autoWarming memory requirements
Date Sun, 02 Nov 2014 12:49:46 GMT
I have Solr server set up on CentOS that's being queried from a Flask app in
a very specific/controlled way. Basically, I just have a large (200 million)
amount of largely static name/address data (along with an internal record ID
field and a few integer fields). I'm running 50 threads that need to do a
search on name/address/birth-date and return an ID value and an integer
modeling score as quickly as possible.

Here is the schema.xml information for the fields I'm using:

   <field name="external_id" type="string" indexed="true" stored="false"
required="false" multiValued="false" />
   <field name="internal_id" type="string" indexed="false" stored="true"
multiValued="false" />
   <field name="score" type="int" indexed="false" stored="true" />

   <field name="first_name" type="text_general" indexed="true"
   <field name="last_name" type="text_general" indexed="true"
   <field name="city" type="text_general" indexed="true" stored="true"/>
   <field name="state" type="string" indexed="true" stored="true"/>

   <field name="birth_year" type="string" indexed="true" stored="false" />
   <field name="birth_month" type="string" indexed="true" stored="false" />
   <field name="birth_day" type="string" indexed="true" stored="false" />

I had a similar set-up working well when I was using 1-4 threads, but since
upping the number of threads querying the Solr server I'm running into Out
Of Memory errors. I removed the autoWarming filter queries from
solrconfig.xml and upped the RAM on the box to 24 gigs and JVM to 8 gigs and
changed the directory Factory from MMap to NIOFS and that solved the memory
problems but performance is pretty bad with most queries taking over 1
second to return a response.

Here's a screenshot showing the breakdown of a heap dump I did before I
upped the RAM/JVM the first time: 

Since I'm only querying Solr in a very specific way, I'd like to set up the
filterCache so that I have filters on U.S. State Abbreviation and Birth
Month cached but how much memory would I need?

Here's an example of what I had previously (now commented out) in the
QuerySenderListener to auto-warm the filterCaches:

        <lst><str name="q">*:*</str><str name="fq">state:CA</str><str
        <lst><str name="q">*:*</str><str name="fq">state:CA</str><str
        <lst><str name="q">*:*</str><str name="fq">state:CA</str><str
        <lst><str name="q">*:*</str><str name="fq">state:CA</str><str

The number of documents matching each query this way range in size from a
few thousand to one million.

View this message in context:
Sent from the Solr - User mailing list archive at

View raw message