lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <>
Subject Re: Solr crawls during replication
Date Fri, 03 Sep 2010 17:46:19 GMT
  On 9/2/2010 9:31 AM, Mark wrote:
> Thanks for the suggestions. Our slaves have 12G with 10G dedicated to 
> the JVM.. too much?
> Are the rysnc snappuller featurs still available in 1.4.1? I may try 
> that to see if helps. Configuration of the switches may also be possible.
> Also, would you mind explaining your second point... using dual NIC 
> cards. How can this be accomplished/configured. Thanks for you help

I will first admit that I am a relative newbie at this whole thing, so 
find yourself a grain of salt before you read further ...

While it's probably not a bad idea to change to an rsync method and 
implement bandwidth throttling, I'm betting the real root of your issue 
is that you're low on memory, making your disk cache too small.  When 
you do a replication, the simple act of copying the data shoves the 
current index completely out of RAM, so when you do a query, it has to 
go back to the disk (which is now VERY busy) to satisfy it.

Unless you know for sure that you need 10GB dedicated to the JVM, go 
with much smaller values, because out of the 12GB available, that will 
only leave you about 1.5GB, assuming the machine has no GUI and no other 
processes.  If you need the JVM that large because you have very large 
Solr caches, consider reducing their size dramatically.  In deciding 
whether to use precious memory for the OS disk cache or Solr caches, the 
OS should go first.  Additionally, If you have large Solr caches with a 
small disk cache and configure large autowarm counts, you end up with 
extremely long commit times.

I don't know how the 30GB of data in your index is distributed among the 
various Lucene files, but for an index that size, I'd want to have 
between 8GB and 16GB of RAM available to the OS just for disk caching, 
and if more is possible, even better.  If you could get more than 32GB 
of RAM in the server, your entire index would fit, and it would be very 

With a little research, I came up (on my own) with what I think is a 
decent rule of thumb, and I'm curious what the experts think of this 
idea:  Find out how much space is taken by the index files with the 
following extensions: fnm, fdx, frq, nrm, tii, tis, and tvx.  Think of 
that as a bare minimum disk cache size, then shoot for between 1.5 and 3 
times that value for your disk cache, so it can also cache parts of the 
other files.


View raw message