lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: MmapDirectory and IndexReader reuse
Date Fri, 15 Jul 2016 10:00:53 GMT
Hi,

You should keep the IndexReader open for the whole time! Otherwise there are more bottlenecks
and slowdowns.

If you are updating the Index, you should use SearcherManager that reopens the index reader
accordingly. After updating the index you should also not completely close and reopen the
index. SearcherManager uses the DirectoryReader.reopen() method, which just updates the "view"
currently seen and involves minimal syscalls (none at all if nothing changes).

> My worry is what happens if indexer runs and writes to the index files
> while they are mmap'ed in memory - could this lead to corrupted search ?

No, because Lucene never changes existing files. All stuff is done in new files which get
visible after flushing/committing or reopening as described above. In addition merging of
those immutable segments is done in the background while indexing, but all files currently
referred by IndexReaders/IndexSearchers are still immutable and stay alive until the IndexReader
is closed.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Vladimir Kotal [mailto:vladimir.kotal@oracle.com]
> Sent: Friday, July 15, 2016 11:49 AM
> To: java-user@lucene.apache.org
> Subject: MmapDirectory and IndexReader reuse
> 
> 
> Hi all,
> 
> when trying to identify bottlenecks in our application, I found that
> each search which involves multiple indexes is performing lots of
> mmap()/open() syscalls. This is a natural consequence of using
> MmapDirectory. So even if file system caches are properly warmed, this
> might add couple of seconds (depending on operating system or
> virtualization technology) to the request handling time, especially when
> the number of searched indexes is in hundreds (see
> https://github.com/OpenGrok/OpenGrok/issues/1116 for the gory detail).
> 
> I was wondering if we can amortize the syscall load by caching
> IndexReader objects. The search (which is done in webapp) looks like this:
> 
> 
> https://github.com/OpenGrok/OpenGrok/blob/master/src/org/opensolaris/
> opengrok/search/SearchEngine.java#L203
> 
> and the idea would be to reuse each IndexReader until the next refresh
> of its pertaining index. This would avoid the syscalls during
> MmapDirectory.open().
> 
> My worry is what happens if indexer runs and writes to the index files
> while they are mmap'ed in memory - could this lead to corrupted search ?
> 
> The reindex work is visible here:
> 
> 
> https://github.com/OpenGrok/OpenGrok/blob/master/src/org/opensolaris/
> opengrok/index/IndexDatabase.java#L341
> 
> The documents are added or removed in the call to indexDown() which is
> basically recursive traversal of directory tree. The commit happens only
> after the traversal is done.
> 
> The IndexWriter is setup with CREATE_OR_APPEND which I am not sure is
> desired for the reuse. If we can avoid index files to be written into
> (or at least make sure they are appended only) while reindexing, this
> should make the reuse possible I think.
> 
> Any comments are welcome,
> 
> 
> v.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message