lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Preventing solr cache flush when committing
Date Wed, 25 Apr 2018 20:37:07 GMT
Had this typed up yesterday and forgot to send.

"Is there no way to ensure that the top level filter caches are not
expunged when some documents are added to the index and have the
changes available at the same time?"

no. And it's not something that you can do without major architectural
changes. When you commit, background merging kicks in which will
renumber the _internal_ Lucene document ID. This ID ranges 0-maxDoc
and is used as the bit to set in the filterCache object. So if you
preserved the filterCache, the bits will be wrong. The
queryResultCache is


"If that is the case, then do I need to always have to rely on warmup
of caches to get some documents in caches?"

Yes, that's exactly what the "autowarm" feature is on the caches. Also
the newSearcher event can be used to hand-craft warmup searches where
you know certain things about the index and you specifically want to
ensure certain warming.

Please start out with modest numbers for autowarm, as in 20-30. It's
very often the case that you don't need much more than that. What
those numbers do in filterCache and queryResultCache is re-execute the
associated fq or q clause, respectively.

"Are there any other approaches then warmup which folks usually do to
avoid this; if they want to build a fast searchable product and having
some write throughput as well?" and " I can't afford to get my cached
flushed".

What evidence do you have for this last statement?

"Currently I do commits via my indexing application (after every batch
of documents)"

Please, please, please do _not_ do this. It's especially egregious
because you do it after every batch of docs. So rather than flushing
your caches every 5 minutes (say), you hammer Solr with commit after
commit after commit. Configure your soft commit interval to your
latency requirements and forget about it. Or just configure hard
commit with openSearcher set to true. Or perhaps even just specify
commitWithin when you send docs to Solr. At a guess you may have seen
warnings about "too many on deck searchers" if your commit interval ls
shorter than your autowarm time.

I'll bend a little bit if the client only issues a commit at the very
end of the run and there's precisely one client running at a time and
you can _guarantee_ there's only one commit, but it's usually much
easier and more reliable to use the solr config settings.

Perhaps you're not entirely familiar with how openSearcher works, so
here's a brief review. This applies to either hard commit
(openSearcher=true) or soft commit.
1> a commit happens
2> a new searcher is being opened and autowarming kicks off
3> incoming searches are served by the _old_ searcher, using all the
_old_ caches.
4> autowarming completes
5a> incoming requests are routed to the new searcher
5b> the old searcher finishes serving the outstanding requests
received before <4> and closes
6> the old caches are flushed.

So having high read throughput

On Tue, Apr 24, 2018 at 10:36 AM, Lee Carroll
<lee.a.carroll@googlemail.com> wrote:
> From memory try the following:
> Don't manually commit from client after batch indexing
> set soft commit to be a a long time interval. As long as acceptable to run
> stale, say 5 mins or longer if you can.
> set hard commit to be short   (seconds ) to keep everything neat and tidy
> regards updates and avoid backing up log files
> set opensearcher=false
>
> I'm pretty sure that works for at least one of our indices. It's worth a go.
>
> Lee C
>
> On 24 April 2018 at 06:56, Papa Pappu <tuhaipappu@gmail.com> wrote:
>
>> Hi,
>> I've written down my query over stack-overflow. Here is the link for that :
>> https://stackoverflow.com/questions/49993681/preventing-
>> solr-cache-flush-when-commiting
>>
>> In short, I am facing troubles maintaining my solr caches when commits
>> happen and the question provides detailed description of the same.
>>
>> Based on my use-case if someone can recommend what settings I should use or
>> practices I should follow it'll be really helpful.
>>
>> Thanks and regards,
>> Dmitri
>>

Mime
View raw message