lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From siddharth teotia <siddharthteo...@gmail.com>
Subject Re: Memory usage
Date Mon, 11 Nov 2019 20:40:59 GMT
Thanks, Stephen. I have asked my questions at solr-user@lucene.apache.org

On Mon, Nov 11, 2019 at 11:27 AM Stephen Bianamara <sbianamara@panopto.com>
wrote:

> Siddharth -- Part of the confusion here is that this is not the right email
> list to ask. General is about releases, publicity, and things of that
> nature. Technical threads like this are more suited for
> solr-user@lucene.apache.org. Please subscribe there and redirect your
> question there instead.
>
> Best,
> Stephen
>
> On Mon, Nov 11, 2019 at 11:18 AM siddharth teotia <
> siddharthteotia@gmail.com>
> wrote:
>
> > Hi Michael
> >
> > Can you or someone from the community please help answer my questions?
> >
> > Thanks
> > Siddharth
> >
> > On Thu, Nov 7, 2019 at 7:50 AM siddharth teotia <
> siddharthteotia@gmail.com
> > >
> > wrote:
> >
> > > Hi Michael
> > >
> > > Thanks a lot for your response. Couple of more questions
> > >
> > > (1) During indexing, is there any knob to tell the writer to use
> off-heap
> > > for buffering. I didn't find anything in the docs so probably the
> answer
> > is
> > > no. Just confirming..
> > >
> > > (2) In my experiments, I have gone upto ingesting 5 million documents
> > into
> > > the lucene index and the number of segments created was 1. The writer
> was
> > > committed and closed after ingesting all the documents and after that
> > there
> > > is no need for us to index more. So essentially it is an immutable
> index.
> > > Basically I wanted to find the threshold for creating a new segment. Is
> > > that pretty high? Or if the writer is reopened, then the next set of
> > > documents will go into the next segment and so on? The reason for doing
> > > this is to find the total number of files (per index) that will be
> opened
> > > during querying. So far since it was a single segment, only that
> > segment's
> > > cfs file was opened.
> > >
> > > Thanks
> > > Siddharth
> > >
> > > On Thu, Nov 7, 2019, 6:39 AM Michael McCandless <
> > lucene@mikemccandless.com>
> > > wrote:
> > >
> > >> Hi Siddharth,
> > >>
> > >> Your understanding of MMapDirectory is correct -- only give your JVM
> > >> enough heap to not spend too much CPU on GC, and then let the OS use
> all
> > >> available remaining RAM to cache hot pages from your index.
> > >>
> > >> There are some structures Lucene loads into JVM heap, but even those
> are
> > >> being moved off-heap (accessed via Directory) recently such as FSTs
> used
> > >> for the terms index, and BKD index (for dimensional points).  I'm not
> > sure
> > >> exactly which structures are still in heap ... maybe the live
> documents
> > >> bitset?
> > >>
> > >> During indexing, the recently indexed documents are buffered in JVM
> > heap,
> > >> up until the IndexWriterConfig.setRAMBufferSizeMB and then they will
> be
> > >> written to the Directory as new segments.
> > >>
> > >> Mike McCandless
> > >>
> > >> http://blog.mikemccandless.com
> > >>
> > >>
> > >> On Wed, Nov 6, 2019 at 11:27 PM siddharth teotia <
> > >> siddharthteotia@gmail.com> wrote:
> > >>
> > >>> Hi All
> > >>>
> > >>> I have some questions about the memory usage. I would really
> appreciate
> > >>> if
> > >>> someone can help answer these.
> > >>>
> > >>> I understand from the docs that during reading/querying, Lucene uses
> > >>> MMapDirectory (assuming it is supported on the platform). So the Java
> > >>> heap
> > >>> overhead in this case will purely come from the objects that are
> > >>> allocated/instantiated on the query path to process the query and
> build
> > >>> results etc.  But the whole index itself will not be loaded into
> memory
> > >>> because we memory mapped the file. Is my understanding correct? In
> this
> > >>> case, we are better off not increasing the Java heap and keep as much
> > >>> as possible available for the file system cache for mmap to do its
> job
> > >>> efficiently.
> > >>>
> > >>> However, are there any portions of index structures that are
> completely
> > >>> loaded in memory regardless of whether it is MMapDirectory or not?
If
> > so,
> > >>> are they loaded in Java heap or do we use off-heap (direct buffers)
> in
> > >>> such cases?
> > >>>
> > >>> Secondly, on the write path I think even though the writer opens a
> > >>> MMapDirectory, the writes are gathered/buffered in memory upto a
> flush
> > >>> threshold controlled by IndexWriterConfig. Is this buffering done in
> > Java
> > >>> heap or direct memory?
> > >>>
> > >>> Thanks a lot for help
> > >>> Siddharth
> > >>>
> > >>
> >
> > --
> > *Best Regards,*
> > *SIDDHARTH TEOTIA*
> > *2008C6PS540G*
> > *BITS PILANI- GOA CAMPUS*
> >
> > *+91 87911 75932*
> >
>
>
> --
> Thanks!
>
> Stephen Bianamara
> Search Technology - Technical Lead
>


-- 
*Best Regards,*
*SIDDHARTH TEOTIA*
*2008C6PS540G*
*BITS PILANI- GOA CAMPUS*

*+91 87911 75932*

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message