lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amrit Sarkar <sarkaramr...@gmail.com>
Subject Re: SolrClould 6.6 stability challenges
Date Sat, 04 Nov 2017 13:20:09 GMT
Pretty much what Emir has stated. I want to know, when you saw;

all of this runs perfectly ok when indexing isn't happening. as soon as
> we start "nrt" indexing one of the follower nodes goes down within 10 to 20
> minutes.


When you say "NRT" indexing, what is the commit strategy in indexing. With
auto-commit so highly set, are you committing after batch, if yes, what's
the number.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Sat, Nov 4, 2017 at 2:47 PM, Emir Arnautović <
emir.arnautovic@sematext.com> wrote:

> Hi Rick,
> Do you see any errors in logs? Do you have any monitoring tool? Maybe you
> can check heap and GC metrics around time when incident happened. It is not
> large heap but some major GC could cause pause large enough to trigger some
> snowball and end up with node in recovery state.
> What is indexing rate you observe? Why do you have max warming searchers 5
> (did you mean this with autowarmingsearchers?) when you commit every 5 min?
> Why did you increase it - you seen errors with default 2? Maybe you commit
> every bulk?
> Do you see similar behaviour when you just do indexing without queries?
>
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 4 Nov 2017, at 05:15, Rick Dig <teramera@gmail.com> wrote:
> >
> > hello all,
> > we are trying to run solrcloud 6.6 in a production setting.
> > here's our config and issue
> > 1) 3 nodes, 1 shard, replication factor 3
> > 2) all nodes are 16GB RAM, 4 core
> > 3) Our production load is about 2000 requests per minute
> > 4) index is fairly small, index size is around 400 MB with 300k documents
> > 5) autocommit is currently set to 5 minutes (even though ideally we would
> > like a smaller interval).
> > 6) the jvm runs with 8 gb Xms and Xmx with CMS gc.
> > 7) all of this runs perfectly ok when indexing isn't happening. as soon
> as
> > we start "nrt" indexing one of the follower nodes goes down within 10 to
> 20
> > minutes. from this point on the nodes never recover unless we stop
> > indexing.  the master usually is the last one to fall.
> > 8) there are maybe 5 to 7 processes indexing at the same time with
> document
> > batch sizes of 500.
> > 9) maxRambuffersizeMB is 100, autowarmingsearchers is 5,
> > 10) no cpu and / or oom issues that we can see.
> > 11) cpu load does go fairly high 15 to 20 at times.
> > any help or pointers appreciated
> >
> > thanks
> > rick
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message