lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Ashcraft <jashcr...@edgate.com>
Subject Re: solr blocking and client timeout issue
Date Thu, 23 Jul 2015 15:56:54 GMT
A quick follow up, after finding and eliminating some code that was 
generating multiple update requests per second, applying the CMS GC 
tuning options, and upgrading to Java 8, we've not experienced a single 
long term GC pause.  The java 8 upgrade got rid of the final couple of 
pauses during the day that we were seeing.

Thanks for all the help and suggestions.

On 7/21/2015 1:10 AM, Daniel Collins wrote:
> We have a similar situation: production runs Java 7u10 (yes, we know its
> old!), and has custom GC options (G1 works well for us), and a 40Gb heap.
> We are a heavy user of NRT (sub-second soft-commits!), so that may be the
> common factor here.
>
> Every time we have tried a later Java 7 or Java 8, the heap blows up in no
> time at all.  We are still investigating the root cause (we do need to
> migrate to Java 8), but I'm thinking that very high commit rates seem to be
> the common link here (and its not a common Solr use case I admit).
>
> I don't have any silver bullet answers to offer yet, but my
> suspicion/conjecture (no real evidence yet, I admit) is that the frequent
> commits are leaving temporary objects around (which they are entitled to
> do), and something has changed in the GC in later Java 7/8 which means they
> are slower to get rid of those, hence the overall heap usage is higher
> under this use case.
>
> @Jeremy, you don't have a lot of head room, but try a higher heap size?
> Could you go to 6Gb and see if that at least delays the issue?
>
> Erick is correct though, if you can reduce the commit rate, I'm sure that
> would alleviate the issue.
>
> On 21 July 2015 at 05:31, Erick Erickson <erickerickson@gmail.com> wrote:
>
>> bq: the config is set up per the NRT suggestions in the docs.
>> autoSoftCommit every 2 seconds and autoCommit every 10 minutes.
>>
>> 2 second soft commit is very aggressive, no matter what the NRT
>> suggestions are. My first question is whether that's really needed.
>> The soft commits should be as long as you can stand. And don't listen
>> to  your product manager who says "2 seconds is required", push back
>> and answer whether that's really necessary. Most people won't notice
>> the difference.
>>
>> bq: ...we are noticing a lot higher number of hard commits than usual.
>>
>> Is a client somewhere issuing a hard commit? This is rarely
>> recommended... And is openSearcher true or false? False is a
>> relatively cheap operation, true is quite expensive.
>>
>> More than you want to know about hard and soft commits:
>>
>>
>> https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>>
>> Best,
>> Erick
>>
>> Best,
>> Erick
>>
>> On Mon, Jul 20, 2015 at 12:48 PM, Jeremy Ashcraft <jashcraft@edgate.com>
>> wrote:
>>> heap is already at 5GB
>>>
>>> On 07/20/2015 12:29 PM, Jeremy Ashcraft wrote:
>>>> no swapping that I'm seeing, although we are noticing a lot higher
>> number
>>>> of hard commits than usual.
>>>>
>>>> the config is set up per the NRT suggestions in the docs.
>> autoSoftCommit
>>>> every 2 seconds and autoCommit every 10 minutes.
>>>>
>>>> there have been 463 updates in the past 2 hours, all followed by hard
>>>> commits
>>>>
>>>> INFO  - 2015-07-20 12:26:20.979;
>>>> org.apache.solr.update.DirectUpdateHandler2; start
>>>>
>> commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
>>>> INFO  - 2015-07-20 12:26:21.021;
>> org.apache.solr.core.SolrDeletionPolicy;
>>>> SolrDeletionPolicy.onCommit: commits: num=2
>>>>
>>>> commit{dir=NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@
>> /opt/solr/solr/collection1/data/index
>>>> lockFactory=org.apache.lucene.store.NativeFSLockFactory@524b89bd;
>>>> maxCacheMB=48.0
>> maxMergeSizeMB=4.0),segFN=segments_e9nk,generation=665696}
>>>> commit{dir=NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@
>> /opt/solr/solr/collection1/data/index
>>>> lockFactory=org.apache.lucene.store.NativeFSLockFactory@524b89bd;
>>>> maxCacheMB=48.0
>> maxMergeSizeMB=4.0),segFN=segments_e9nl,generation=665697}
>>>> INFO  - 2015-07-20 12:26:21.022;
>> org.apache.solr.core.SolrDeletionPolicy;
>>>> newest commit generation = 665697
>>>> INFO  - 2015-07-20 12:26:21.026;
>>>> org.apache.solr.update.DirectUpdateHandler2; end_commit_flush
>>>> INFO  - 2015-07-20 12:26:21.026;
>>>> org.apache.solr.update.processor.LogUpdateProcessor; [collection1]
>>>> webapp=/solr path=/update params={omitHeader=false&wt=json}
>>>> {add=[8653ea29-a327-4a54-9b00-8468241f2d7c (1507244513403338752),
>>>> 5cf034a9-d93a-4307-a367-02cb21fa8e35 (1507244513404387328),
>>>> 816e3a04-9d0e-4587-a3ee-9f9e7b0c7d74 (1507244513405435904)],commit=} 0
>> 50
>>>> could that be causing some of the problems?
>>>>
>>>> ________________________________________
>>>> From: Shawn Heisey <apache@elyograg.org>
>>>> Sent: Monday, July 20, 2015 11:44 AM
>>>> To: solr-user@lucene.apache.org
>>>> Subject: Re: solr blocking and client timeout issue
>>>>
>>>> On 7/20/2015 11:54 AM, Jeremy Ashcraft wrote:
>>>>> I'm ugrading to the 1.8 JDK on our dev VM now and testing. Hopefully
i
>>>>> can get production upgraded tonight.
>>>>>
>>>>> still getting the big GC pauses this morning, even after applying the
>>>>> GC tuning options.  Everything was fine throughout the weekend.
>>>>>
>>>>> My biggest concern is that this instance had been running with no
>>>>> issues for almost 2 years, but these GC issues started just last week.
>>>> It's very possible that you're simply going to need a larger heap than
>>>> you have needed in the past, either because your index has grown, or
>>>> because your query patterns have changed and now your queries need more
>>>> memory.  It could even be both of these.
>>>>
>>>> At your current index size, assuming that there's nothing else on this
>>>> machine, you should have enough memory to raise your heap to 5GB.
>>>>
>>>> If there ARE other software pieces on this machine, then the long GC
>>>> pauses (along with other performance issues) could be explained by too
>>>> much memory allocation out of the 8GB total memory, resulting in
>>>> swapping at the OS level.
>>>>
>>>> Thanks,
>>>> Shawn
>>>>
>>> --
>>> *jeremy ashcraft*
>>> development manager
>>> EdGate Correlation Services <http://correlation.edgate.com>
>>> /253.853.7133 x228/


Mime
View raw message