lucenenet-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Irwin <pir...@feature23.com>
Subject RE: confused about segment merging and commits
Date Wed, 27 Sep 2017 14:08:43 GMT
Thinking about this further, you also should probably consider the lock implications of the
index writer. Disposing of the writer will by default wait for merges to finish as well as
releasing the lock. I don't know if you should or should not have the index locked when you
back up. If anyone has any thoughts here that would be appreciated, as I'd be interested in
learning what others would do here.

Paul

-----Original Message-----
From: Paul Irwin [mailto:pirwin@feature23.com] 
Sent: Wednesday, September 27, 2017 9:59 AM
To: user@lucenenet.apache.org
Subject: RE: confused about segment merging and commits

Merging indeed will happen in the background and not block unless you're waiting on an Optimize
call or WaitForMerges/Dispose(true). The idea is that the segments being merged have already
been committed. Since it's creating a new, larger segment as part of the merge instead of
altering any existing segments, this can happen safely in the background/in parallel. Once
the merge is complete it will record in the index to use the new segment and discard the old
ones, then delete the old segments. Think about the information about which segments are in
the index as an atomic pointer. You do all the hard, slow work of copying data in the background
as part of the merge, then once complete you do the trivial, fast work of changing the pointer.

In regards to your backup system, to be on the safe side I would probably call .Commit() followed
by .WaitForMerges() if you want the quickest wait, or .Optimize() if you want to "force" it
to merge before backing up. Both .WaitForMerges() and .Optimize() end up waiting on merges
to finish, but Optimize will also try to merge if needed before waiting. Conventional wisdom,
especially post-3.0 in the Java Lucene world, is to not call Optimize because Lucene can make
better decisions about that than you can. However, from my experience, Optimize is very useful
at the end of large batch index writes (i.e. daily) so that the merge doesn't have to wait
until the next batch runs. That helps with consistent day-to-day performance and semi-predictable
index run times.

Paul

-----Original Message-----
From: Jonathan Resnick [mailto:jresnick@gmail.com] 
Sent: Wednesday, September 27, 2017 8:59 AM
To: user@lucenenet.apache.org
Subject: confused about segment merging and commits

Hi,

I am trying to understand how segment merging interacts with commits.

Consider the following timeline of events:

1. IndexWriter is opened on an index.
2. IndexWriter is used to add/update/delete docs, but not yet commit the changes.
3. Activity in step 2 triggers segment merging on a background thread.
4. Commit() is called on IndexWriter while merging in step 3 has not yet finished.

Does the Commit() in step 4 block while the segment merge in step 3 finishes?
If not, then when is the segment merge in 3 "committed" to the index? (i.e.
at what point would a new IndexReader see the merged segment file?) Or does segment merging
happen entirely independently of commits?

[More context: we are trying to build a backup system that copies the index files to a backup
server after every commit. Initially I thought it would be sufficient to just keep track of
file add/update/deletes since the previous commit, but if segment merging is happening concurrently
then perhaps it's not so simple?]

More generally, is there any in-depth documentation available describing how segment merging
interacts with commits (even if it's for the Java version of Lucene)?  My web searches have
not turned up much...

Many thanks,
Jonathan
Mime
View raw message