lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Memory usage
Date Thu, 07 Nov 2019 14:38:48 GMT
Hi Siddharth,

Your understanding of MMapDirectory is correct -- only give your JVM enough
heap to not spend too much CPU on GC, and then let the OS use all available
remaining RAM to cache hot pages from your index.

There are some structures Lucene loads into JVM heap, but even those are
being moved off-heap (accessed via Directory) recently such as FSTs used
for the terms index, and BKD index (for dimensional points).  I'm not sure
exactly which structures are still in heap ... maybe the live documents
bitset?

During indexing, the recently indexed documents are buffered in JVM heap,
up until the IndexWriterConfig.setRAMBufferSizeMB and then they will be
written to the Directory as new segments.

Mike McCandless

http://blog.mikemccandless.com


On Wed, Nov 6, 2019 at 11:27 PM siddharth teotia <siddharthteotia@gmail.com>
wrote:

> Hi All
>
> I have some questions about the memory usage. I would really appreciate if
> someone can help answer these.
>
> I understand from the docs that during reading/querying, Lucene uses
> MMapDirectory (assuming it is supported on the platform). So the Java heap
> overhead in this case will purely come from the objects that are
> allocated/instantiated on the query path to process the query and build
> results etc.  But the whole index itself will not be loaded into memory
> because we memory mapped the file. Is my understanding correct? In this
> case, we are better off not increasing the Java heap and keep as much
> as possible available for the file system cache for mmap to do its job
> efficiently.
>
> However, are there any portions of index structures that are completely
> loaded in memory regardless of whether it is MMapDirectory or not? If so,
> are they loaded in Java heap or do we use off-heap (direct buffers) in
> such cases?
>
> Secondly, on the write path I think even though the writer opens a
> MMapDirectory, the writes are gathered/buffered in memory upto a flush
> threshold controlled by IndexWriterConfig. Is this buffering done in Java
> heap or direct memory?
>
> Thanks a lot for help
> Siddharth
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message