lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From danny teichthal <dannyt...@gmail.com>
Subject Re: SolrCloud updates are slow on replica (DBQ?)
Date Sun, 21 Jun 2015 16:37:47 GMT
Thanks Erik,
Actually, only lately we started to use autoSoftCommit lately because of
the performance warning, it was after reading the first link you provided.
Our application does frequent updates from batch and online requests.
Until now we issued softCommit after each user transaction finished. We
were able to reduce the vast majority of the manual commits and left a few
cases where the commit was essential for the screens not to fail.
Due to business requirements, 2 seconds are the maximum we can do for now.
But if you say that a few more seconds will make a difference we will try
to increase it.

As for GC, we continuously check the GC logs and can say for sure that it
is not the problem on our case.
Regarding cache - we don't use auto warm at all. For my small understanding
the cache is not that big, please correct me if I'm wrong.

<queryResultCache class="solr.LRUCache" size="64"
initialSize="32" autowarmCount="0" />
<documentCache class="solr.FastLRUCache" size="4096"
initialSize="1024" autowarmCount="0" />
<filterCache class="solr.FastLRUCache" size="4096"
initialSize="1024" autowarmCount="0" />
<fieldValueCache class="solr.FastLRUCache" size="4096"
initialSize="1024" autowarmCount="0" />



2 more quetions, just for understanding:
1. What is the reason behind removing the maxdocs?  If I set no limit,
couldn't it explode the transaction log  in case of heavy indexing?
2. Do you think that the DBQ is causing a problem or is just indicating on
it, is there a problem with many DBQs?

We will probably start by removing the maxDocs and openSearcher=true.
The link about indexing performance also looks very relevant - I will read
it thoroughly.

Thanks again,





On Sun, Jun 21, 2015 at 6:29 PM, Erick Erickson <erickerickson@gmail.com>
wrote:

> The very first thing I would do is straighten out your commit strategy,
> they are _very_ aggressive. I'd guess you're also seeing warnings in
> the logs about "too many on deck searchers" or something like, or
> you've upped your max warming searchers in solrconfig.xml.
>
> Soft commits aren't free. They're less expensive than hard
> commits (openSearcher=true), but they're not free. Here's a long
> writeup on this:
>
> https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>
> What I'd do:
> 1> remove maxDocs entirely
> 2> set openSearcher=false for your autoCommit
> 3> remove maxDocs from your autoSoftCommit
> 4> lengthen the soft commit as much as you can stand.
> 5> if you must have very short soft commits, consider
>     turning off (or at least down) your caches in solrconfig.xml
> 6> stop issuing any kind of commits from the client. This is
>    an anti-pattern except in very unusual circumstances and
>    in your setup you see all the docs 2 seconds later anyway
>   so it is doing you no good and (maybe) active harm.
>
> If the problem persists, try looking at your garbage collection,
> you may well be hitting long GC pauses.
>
> Also note that there was a bottleneck in Solr prior to 5.2
> when replicas were present, see:
> http://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/
>
> Best,
> Erick
>
> On Sun, Jun 21, 2015 at 7:14 AM, danny teichthal <dannytei1@gmail.com>
> wrote:
> > Hi,
> >
> >
> > We are experiencing  some intermittent slowness on updates for one of our
> > collections.
> >
> > We see user operations hanging on updates to SOLR via SolrJ client.
> >
> > Every time in the period of the slowness we see something like this in
> the
> > log of the replica:
> >
> > [org.apache.solr.update.UpdateHandler] Reordered DBQs detected.
> > Update=add{_version_=1504391336428568576,id=
> >
> > 2392581250002321}
> DBQs=[DBQ{version=1504391337298886656,q=level_2_id:12345}]
> >
> > After  a while The DBQ is piling up and we see the list of DBQ growing.
> >
> >
> >
> >
> > At some point the time of updates is increase from 300 ms to 20 seconds
> and
> > then on the leader log I see read timeout exception and it initiates
> > recovery on the replica.
> >
> > At that point all updates start to be very slow – from 20 seconds to 60
> > seconds. Especially updates with deletByQuery.
> >
> > We are not sure if the DBQ is the cause or symptom. But, what does not
> make
> > sense to me is that the slowness is only on the replica side.
> >
> > We suspect that the fact that the updates become slow on the replica
> cause
> > a timeout on the leader side and cause the recovery.
> >
> >
> > Would really appreciate any help on this.
> >
> >
> > Thanks,
> >
> >
> >
> >
> >
> >
> >
> >
> > Some info:
> >
> > DBQ are sent as a separate update request from the add requests.
> >
> >
> > We currently use SolrCloud 4.9.0.
> >
> > We have ~140 collections on  4 nodes – 1,2,3,4.
> >
> > Each collection has a single shard with a leader and another replica.
> >
> > ~70 collections are on node 1 and 2 as leader and replica and the other
> > collections are on 3 and 4.
> >
> >
> >
> > On each node there’s about 65GB of index with 25,000,000 documents.
> >
> >
> >
> > This is our update handler, autoSoftCommit is set to 2 seconds, but there
> > may be manual soft commits coming from user operations from time to time:
> >
> >
> >
> > <updateHandler class="solr.DirectUpdateHandler2">
> >
> >                         <autoCommit>
> >
> >                                     <maxDocs>10000</maxDocs>
> >
> >                                     <maxTime>120000</maxTime>
> >
> >                                     <openSearcher>true</openSearcher>
> >
> >                         </autoCommit>
> >
> >                         <autoSoftCommit>
> >
> >                    <maxDocs>1000</maxDocs>
> >
> >                    <maxTime>2000</maxTime>
> >
> >             </autoSoftCommit>
> >
> >                         <updateLog />
> >
> >             </updateHandler>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message