lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <>
Subject Re: Possible memory leaks with frequent replication
Date Wed, 03 Nov 2010 23:09:19 GMT
Do you use EmbeddedSolr in the query server? There is a memory leak
that shows up when taking a lot of replications.

On Wed, Nov 3, 2010 at 8:28 AM, Jonathan Rochkind <> wrote:
> Ah, but reading Peter's email message I reference more carefully, it seems
> that Solr already DOES provide an info-level log warning you about
> over-lapping warming, awesome. (But again, I'm pretty sure it does NOT throw
> or HTTP error in that condition, based on my and others experience).
>> To check if your Solr environment is suffering from this, turn on INFO
>> level logging, and look for: 'PERFORMANCE WARNING: Overlapping
>> onDeckSearchers=x'.
> Sweet, good to know, and I'll definitely add this to my debugging toolbox.
> Peter's listserv message really ought to be a wiki page, I think.  Any
> reason for me not to just add it as a new one with title "Commit frequency
> and auto-warming" or something like that?  Unless it's already in the wiki
> somewhere I haven't found, assuming the wiki will let an ordinary
> user-created account add a new page.
> //
> Jonathan Rochkind wrote:
>> I hadn't looked at the code, am not familiar with Solr code, and can't say
>> what that code does.
>> But I have experienced issues that I _believe_ were caused by too frequent
>> commits causing over-lapping searcher preperation. And I've definitely seen
>> Solr documentation that suggests this is an issue. Let me find it now to see
>> if the experts think these documented suggests are still correct or not:
>> "On the other hand, autowarming (populating) a new collection could take a
>> lot of time, especially since it uses only one thread and one CPU. If your
>> settings fire off snapinstaller too frequently, then a Solr slave could be
>> in the undesirable condition of handing-off queries to one (old) collection,
>> and, while warming a new collection, a second “new” one could be snapped and
>> begin warming!
>> If we attempted to solve such a situation, we would have to invalidate the
>> first “new” collection in order to use the second one, then when a “third”
>> new collection would be snapped and warmed, we would have to invalidate the
>> “second” new collection, and so on ad infinitum. A completely warmed
>> collection would never make it to full term before it was aborted. This can
>> be prevented with a properly tuned configuration so new collections do not
>> get installed too rapidly. "
>> I think I've seen that same advice on another wiki page without being
>> specifically regarding replication, but just being about commit frequency
>> balanced with auto-warming, leading to overlapping warming, leading to
>> spiraling RAM/CPU usage -- but NOT an exception being thrown or HTTP error
>> delivered.
>> I can't find it on the wiki, but here's a listserv post with someone
>> reporting findings that match my understanding:
>> How does this advice square with the code Lance found?  Is my
>> understanding of how frequent commits can interact with time it takes to
>> warm a new collection correct? Appreciate any additional info.
>> Lance Norskog wrote:
>>> Isn't that what this code does?
>>>      onDeckSearchers++;
>>>      if (onDeckSearchers < 1) {
>>>        // should never happen... just a sanity check
>>>        log.error(logid+"ERROR!!! onDeckSearchers is " + onDeckSearchers);
>>>        onDeckSearchers=1;  // reset
>>>      } else if (onDeckSearchers > maxWarmingSearchers) {
>>>        onDeckSearchers--;
>>>        String msg="Error opening new searcher. exceeded limit of
>>> maxWarmingSearchers="+maxWarmingSearchers + ", try again later.";
>>>        log.warn(logid+""+ msg);
>>>        // HTTP 503==service unavailable, or 409==Conflict
>>>        throw new
>>> SolrException(SolrException.ErrorCode.SERVICE_UNAVAILABLE,msg,true);
>>>      } else if (onDeckSearchers > 1) {
>>> onDeckSearchers=" + onDeckSearchers);
>>>      }
>>> On Tue, Nov 2, 2010 at 10:02 AM, Jonathan Rochkind <>
>>> wrote:
>>>> It's definitely a known 'issue' that you can't replicate (or do any
>>>> other
>>>> kind of index change, including a commit) at a faster frequency than
>>>> your
>>>> warming queries take to complete, or you'll wind up with something like
>>>> you've seen.
>>>> It's in some documentation somewhere I saw, for sure.
>>>> The advice to 'just query against the master' is kind of odd, because,
>>>> then... why have a slave at all, if you aren't going to query against
>>>> it?  I
>>>> guess just for backup purposes.
>>>> But even with just one solr, or querying master, if you commit at rate
>>>> such
>>>> that commits come before the warming queries can complete, you're going
>>>> to
>>>> have the same issue.
>>>> The only answer I know of is "Don't commit (or replicate) at a faster
>>>> rate
>>>> than it takes your warming to complete."  You can reduce your warming
>>>> queries/operations, or reduce your commit/replicate frequency.
>>>> Would be interesting/useful if Solr noticed this going on, and gave you
>>>> some
>>>> kind of error in the log (or even an exception when started with a
>>>> certain
>>>> parameter for testing) "Overlapping warming queries, you're committing
>>>> too
>>>> fast" or something. Because it's easy to make this happen without
>>>> realizing
>>>> it, and then your Solr does what Simon says, runs out of RAM and/or uses
>>>> a
>>>> whole lot of CPU and disk io.
>>>> Lance Norskog wrote:
>>>>> You should query against the indexer. I'm impressed that you got 5s
>>>>> replication to work reliably.
>>>>> On Mon, Nov 1, 2010 at 4:27 PM, Simon Wistow <>
>>>>> wrote:
>>>>>> We've been trying to get a setup in which a slave replicates from
>>>>>> master every few seconds (ideally every second but currently we have
>>>>>> it
>>>>>> set at every 5s).
>>>>>> Everything seems to work fine until, periodically, the slave just
>>>>>> stops
>>>>>> responding from what looks like it running out of memory:
>>>>>> org.apache.catalina.core.StandardWrapperValve invoke
>>>>>> SEVERE: Servlet.service() for servlet jsp threw exception
>>>>>> java.lang.OutOfMemoryError: Java heap space
>>>>>> (our monitoring seems to confirm this).
>>>>>> Looking around my suspicion is that it takes new Readers longer to
>>>>>> warm
>>>>>> than the gap between replication and thus they just build up until
>>>>>> memory is consumed (which, I suppose isn't really memory 'leaking'
>>>>>> se, more just resource consumption)
>>>>>> That said, we've tried turning off caching on the slave and that
>>>>>> didn't
>>>>>> help either so it's possible I'm wrong.
>>>>>> Is there anything we can do about this? I'm reluctant to increase
>>>>>> heap space since I suspect that will mean that there's just a longer
>>>>>> period between failures. Might Zoie help here? Or should we just
>>>>>> against the Master?
>>>>>> Thanks,
>>>>>> Simon

Lance Norskog

View raw message