lucenenet-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Svensson <si...@devhost.se>
Subject Re: does FieldCache play well with GC large object heap?
Date Sun, 14 Sep 2014 03:21:45 GMT
Hi,

You can compact the large object heap on gen 2 collections since .net 4.5.1.
http://msdn.microsoft.com/en-us/library/system.runtime.gcsettings.largeobjectheapcompactionmode.aspx

The fieldcache will create an array large enough for a entry for every 
document in in the passed reader. Remember that a normal reader consists 
of several smaller ones, one for every segment. Pass these inner readers 
instead to create several smaller arrays. These arrays are held in 
memory with dictionaries based on a weak reference to the passed reader. 
When there are no more references to the reader, when you've closed it, 
the array will be released (at next garbage collection).

Do you call IndexReader.Reopen to update your indexes? That one will 
reuse already created segment readers, thus also reusing any existing 
field cache arrays. Only new segments will need to be read, and those 
are usually the new smaller ones with the recent changes. Segments are 
readonly, deletions are handled by a separate file which basically just 
contains a very large bit-field stating which documents are deleted. The 
segment data is unchanged; deletions will not force a reload of data for 
the field cache.

What you describe, loading 40 mb arrays at every change, will only 
happen if you do a complete index optimization between your commit and 
field cache.

Example code taken from Stack Overflow:
http://stackoverflow.com/questions/5455543/fieldcache-with-frequently-updating-index

    |var  directory=  FSDirectory.Open(new  DirectoryInfo("index"));
    var  reader=  IndexReader.Open(directory,  readOnly:  true);
    var  documentId=  1337;

    // Grab all subreaders.
    var  subReaders=  new  List<IndexReader>();
    ReaderUtil.GatherSubReaders(subReaders,  reader);

    // Loop through all subreaders. While subReaderId is higher than the
    // maximum document id in the subreader, go to next.
    var  subReaderId=  documentId;
    var  subReader=  subReaders.First(sub  =>  {
         if  (sub.MaxDoc()  <  subReaderId)  {
             subReaderId-=  sub.MaxDoc();
             return  false;
         }

         return  true;
    });

    var  values=  FieldCache_Fields.DEFAULT.GetInts(subReader,  "newsdate");
    var  value=  values[subReaderId];|

// Simon

On 14/09/14 04:33, Jonathan Resnick wrote:
> Hi,
>
> I'm relatively new to Lucene.net.  We've recently integrated it into our
> application to provide search functionality, and at this point I'm
> considering introducing some custom scoring that will require use of the
> FieldCache. Up until now I've been a little wary of making use of the
> fieldcache because I know that it creates huge arrays, and my concern is
> whether this is going to create issues for the GC - specifically wrt
> fragmentation of the large object heap.
>
> For example, if we have ~10M documents, an int field cache will require
> 40MB of contiguous memory, which will be allocated on the large object
> heap. If we're opening new IndexReaders 1000s of times per day (because
> we're adding/updating documents), then we're asking the GC to be
> continually allocating and discarding these 40MB arrays. Since the large
> object heap does not get compacted, and since the array size likely needs
> to grow a bit each time (due to new docs added), it seems this would lead
> to fragmentation and eventual out-of-memory conditions.  Is this an issue
> in practice?
>
> If anyone with more Lucene.net experience could share some insight here, it
> would be much appreciated.
>
> -Jon
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message