lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From William Moss <will.m...@airbnb.com.INVALID>
Subject IndexWriter, DirectoryTaxonomyWriter & SearcherTaxonomyManager synchronization
Date Tue, 27 Sep 2016 00:08:10 GMT
We're using Lucene 5.2.0 (I know it's old, we're in the process of
upgrading) to handle searching over our listings here at Airbnb. I've been
digging into our realtime indexing code and how we use Lucene and I wanted
to check a few assumptions around synchronization, since we see some
periodic exceptions[1] that I can't quite explain.

First, a tiny bit of background
1. We use facets and therefore are writing realtime updates using both
a IndexWriter and DirectoryTaxonomyWriter.
2. We have multiple update threads, consuming messages (from Kafka) and
updating the index.
3. Once we process a batch of messages, we call commit (first on
DirectoryTaxonomyWriter then on IndexWriter).
4. We use SearcherTaxonomyManager to manage instances of IndexSearcher.
5. We periodically call forceMerge on our IndexWriter (to improve
performance).

So, now to a few questions:
1. My understand is the right way to handle a DirectoryTaxonomyWriter and
an IndexWriter is to call commit on DirectoryTaxonomyWriter before
IndexWriter. Is this correct? Since we're using multiple threads, we need
to synchronize these calls within the process regardless, but curious to
understand the design.

2. What about calls to maybeRefresh on SearcherTaxonomyManager? Do those
need to be synchronized with the commit calls to either IndexWriter or
DirectoryTaxonomyWriter? Do we need to call it after ever time we call
commit? The comment suggests we call it "periodically," but I'm not clear
on how often that should be or what conditions trigger the index to change
in way that this would be required.

3. Lastly, what about forceMerge? Is there any worry there or can that just
safely happen in the background? Is there any need to call commit
afterward? Or does forceMerge effectively do that? Presumably, we would not
see the new index until maybeRefresh was called the next time?

Sorry, that was a lot of questions, would love help on any and all of them.

Thanks!
Will

[1] When calling maybeRefresh, we've seen error that look like:
java.nio.file.NoSuchFileException: <snip>/6/_vj1.cfe

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message