lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Audenaerde <rob.audenae...@gmail.com>
Subject Re: commit frequency guideline?
Date Wed, 30 Nov 2016 14:37:11 GMT
Thanks for the quick reply!

>What do you mean by "Lucene complain about too-many uncommitted docs"?

--> good question, I was thoughtlessly echoing words from my colleague. I
asked him and he said that it was about taking very long to commit and
memory issues. So maybe this wasn't the best opening statement :)

For the other part of the question: we need users to see the changed
documents immediately, but I think we have this covered by using NRT
Readers and the SearcherManager.

Am I correct to conclude calling commit() is not necessary for finding
recently changed documents?

I think we can then switch to a time based commit() where we just call
commit every 5 minutes, in effect losing a maximum of 5 minutes of work
(which we can mitigate in another way)
 when the server somehow stops working.

Thank you,
-Rob




On Wed, Nov 30, 2016 at 3:17 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> What do you mean by "Lucene complain about too-many uncommitted docs"?
>  Lucene does not really care how frequently you commit...
>
> How frequently you commit is really your choice, i.e. what risk you
> see of power loss / OS crash vs the cost (not just in CPU/IO work for
> the computer, but in the users not seeing the recently indexed
> documents for a while) of replaying those documents since the last
> commit when power comes back.
>
> Pushing durability back into the queue/channel can be a nice option
> too, e.g. Kafka, so that your application doesn't need to keep track
> of which docs were not yet committed.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Wed, Nov 30, 2016 at 8:50 AM, Rob Audenaerde
> <rob.audenaerde@gmail.com> wrote:
> > Hi all,
> >
> > Currently we call commit() many times on our index (about 5M docs, where
> > some 10.000-100.000 modifications during the day). The commit times
> > typically get more expensive when the index grows, up to several seconds,
> > so we want to reduce the number of calls.
> >
> > (Historically, we had Lucene complain about too-many uncommitted docs
> > sometimes, so we went with the commit often approach.)
> >
> > What is a good strategy for calling commit? Fixed frequency? After X
> docs?
> > Combination?
> >
> > I'm curious what is considered 'industry-standard'. Can you share some of
> > your expercience?
> >
> > Thanks!
> >
> > -Rob
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message