lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark <>
Subject Re: Solr crawls during replication
Date Fri, 03 Sep 2010 20:36:20 GMT
  On 9/3/10 11:37 AM, Jonathan Rochkind wrote:
> Is the OS disk cache something you configure, or something the OS just does automatically
based on available free RAM?  Or does it depend on the exact OS?  Thinking about the OS disk
cache is new to me. Thanks for any tips.
> ________________________________________
> From: Shawn Heisey []
> Sent: Friday, September 03, 2010 1:46 PM
> To:
> Subject: Re: Solr crawls during replication
>    On 9/2/2010 9:31 AM, Mark wrote:
>> Thanks for the suggestions. Our slaves have 12G with 10G dedicated to
>> the JVM.. too much?
>> Are the rysnc snappuller featurs still available in 1.4.1? I may try
>> that to see if helps. Configuration of the switches may also be possible.
>> Also, would you mind explaining your second point... using dual NIC
>> cards. How can this be accomplished/configured. Thanks for you help
> I will first admit that I am a relative newbie at this whole thing, so
> find yourself a grain of salt before you read further ...
> While it's probably not a bad idea to change to an rsync method and
> implement bandwidth throttling, I'm betting the real root of your issue
> is that you're low on memory, making your disk cache too small.  When
> you do a replication, the simple act of copying the data shoves the
> current index completely out of RAM, so when you do a query, it has to
> go back to the disk (which is now VERY busy) to satisfy it.
> Unless you know for sure that you need 10GB dedicated to the JVM, go
> with much smaller values, because out of the 12GB available, that will
> only leave you about 1.5GB, assuming the machine has no GUI and no other
> processes.  If you need the JVM that large because you have very large
> Solr caches, consider reducing their size dramatically.  In deciding
> whether to use precious memory for the OS disk cache or Solr caches, the
> OS should go first.  Additionally, If you have large Solr caches with a
> small disk cache and configure large autowarm counts, you end up with
> extremely long commit times.
> I don't know how the 30GB of data in your index is distributed among the
> various Lucene files, but for an index that size, I'd want to have
> between 8GB and 16GB of RAM available to the OS just for disk caching,
> and if more is possible, even better.  If you could get more than 32GB
> of RAM in the server, your entire index would fit, and it would be very
> fast.
> With a little research, I came up (on my own) with what I think is a
> decent rule of thumb, and I'm curious what the experts think of this
> idea:  Find out how much space is taken by the index files with the
> following extensions: fnm, fdx, frq, nrm, tii, tis, and tvx.  Think of
> that as a bare minimum disk cache size, then shoot for between 1.5 and 3
> times that value for your disk cache, so it can also cache parts of the
> other files.
> Thanks,
> Shawn
Ditto on that question

View raw message