lucenenet-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Resnick <jresn...@gmail.com>
Subject Re: does FieldCache play well with GC large object heap?
Date Sun, 14 Sep 2014 17:47:01 GMT
Nick: Thanks for the link, I hadn't realized that this feature was
available as of 4.5.1 (we're still on 4.0 actually, but this is a good
incentive to move).

Simon: Right, it makes sense that the newer segments would be relatively
small. And I guess that merges of the oldest, largest segments would be
infrequent enough that it shouldn't be a problem.

Thanks for your help! This gives me the confidence to proceed.

On Sat, Sep 13, 2014 at 11:21 PM, Simon Svensson <sisve@devhost.se> wrote:

> Hi,
>
> You can compact the large object heap on gen 2 collections since .net
> 4.5.1.
> http://msdn.microsoft.com/en-us/library/system.runtime.gcsettings.
> largeobjectheapcompactionmode.aspx
>
> The fieldcache will create an array large enough for a entry for every
> document in in the passed reader. Remember that a normal reader consists of
> several smaller ones, one for every segment. Pass these inner readers
> instead to create several smaller arrays. These arrays are held in memory
> with dictionaries based on a weak reference to the passed reader. When
> there are no more references to the reader, when you've closed it, the
> array will be released (at next garbage collection).
>
> Do you call IndexReader.Reopen to update your indexes? That one will reuse
> already created segment readers, thus also reusing any existing field cache
> arrays. Only new segments will need to be read, and those are usually the
> new smaller ones with the recent changes. Segments are readonly, deletions
> are handled by a separate file which basically just contains a very large
> bit-field stating which documents are deleted. The segment data is
> unchanged; deletions will not force a reload of data for the field cache.
>
> What you describe, loading 40 mb arrays at every change, will only happen
> if you do a complete index optimization between your commit and field cache.
>
> Example code taken from Stack Overflow:
> http://stackoverflow.com/questions/5455543/fieldcache-
> with-frequently-updating-index
>
>    |var  directory=  FSDirectory.Open(new  DirectoryInfo("index"));
>    var  reader=  IndexReader.Open(directory,  readOnly:  true);
>    var  documentId=  1337;
>
>    // Grab all subreaders.
>    var  subReaders=  new  List<IndexReader>();
>    ReaderUtil.GatherSubReaders(subReaders,  reader);
>
>    // Loop through all subreaders. While subReaderId is higher than the
>    // maximum document id in the subreader, go to next.
>    var  subReaderId=  documentId;
>    var  subReader=  subReaders.First(sub  =>  {
>         if  (sub.MaxDoc()  <  subReaderId)  {
>             subReaderId-=  sub.MaxDoc();
>             return  false;
>         }
>
>         return  true;
>    });
>
>    var  values=  FieldCache_Fields.DEFAULT.GetInts(subReader,
> "newsdate");
>    var  value=  values[subReaderId];|
>
> // Simon
>
>
> On 14/09/14 04:33, Jonathan Resnick wrote:
>
>> Hi,
>>
>> I'm relatively new to Lucene.net.  We've recently integrated it into our
>> application to provide search functionality, and at this point I'm
>> considering introducing some custom scoring that will require use of the
>> FieldCache. Up until now I've been a little wary of making use of the
>> fieldcache because I know that it creates huge arrays, and my concern is
>> whether this is going to create issues for the GC - specifically wrt
>> fragmentation of the large object heap.
>>
>> For example, if we have ~10M documents, an int field cache will require
>> 40MB of contiguous memory, which will be allocated on the large object
>> heap. If we're opening new IndexReaders 1000s of times per day (because
>> we're adding/updating documents), then we're asking the GC to be
>> continually allocating and discarding these 40MB arrays. Since the large
>> object heap does not get compacted, and since the array size likely needs
>> to grow a bit each time (due to new docs added), it seems this would lead
>> to fragmentation and eventual out-of-memory conditions.  Is this an issue
>> in practice?
>>
>> If anyone with more Lucene.net experience could share some insight here,
>> it
>> would be much appreciated.
>>
>> -Jon
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message