lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Rutherglen <jason.rutherg...@gmail.com>
Subject Re: heap memory issues when sorting by a string field
Date Tue, 08 Dec 2009 06:48:25 GMT
> It's an apache license - but you mentioned something about no third party
> libraries. Is that a policy for Lucene?

Pretty much...  Though one can always submit a patch anyways.

On Mon, Dec 7, 2009 at 4:57 PM, Tom Hill <solr-list@worldware.com> wrote:
> Hey, that's a nice little Class! I hadn't see it before. But it sounds like
> the asynchronous cleanup might deal with the problem I mentioned above (but
> I haven't looked at the code yet).
>
> It's an apache license - but you mentioned something about no third party
> libraries. Is that a policy for Lucene?
>
> Thanks,
>
> Tom
>
>
> On Mon, Dec 7, 2009 at 4:44 PM, Jason Rutherglen <jason.rutherglen@gmail.com
>> wrote:
>
>> I wonder if Google Collections (even though we don't use third party
>> libraries) concurrent map, which supports weak keys, handles the
>> removal of weakly referenced keys in a more elegant way than Java's
>> WeakHashMap?
>>
>> On Mon, Dec 7, 2009 at 4:38 PM, Tom Hill <solr-list@worldware.com> wrote:
>> > Hi -
>> >
>> > If I understand correctly, WeakHashMap does not free the memory for the
>> > value (cached data) when the key is nulled, or even when the key is
>> garbage
>> > collected.
>> >
>> > It requires one more step: a method on WeakHashMap must be called to
>> allow
>> > it to release its hard reference to the cached data. It appears that most
>> > methods in WeakHashMap end up calling expungeStaleEntries, which will
>> clear
>> > the hard reference. But you have to call some method on the map, before
>> the
>> > memory is eligible for garbage collection.
>> >
>> > So it requires four stages to free the cached data. Null the key; A GC to
>> > release the weak reference to the key; A call to some method on the map;
>> > Then the next GC cycle should free the value.
>> >
>> > So it seems possible that you could end up with double memory usage for a
>> > time. If you don't have a GC between the time that you close the old
>> reader,
>> > and you start to load the field cache entry for the next reader, then the
>> > key may still be hanging around uncollected.
>> >
>> > At that point, it may run a GC when you allocate the new cache, but
>> that's
>> > only the first GC. It can't free the cached data until after the next
>> call
>> > to expungeStaleEntries, so for a while you have both caches around.
>> >
>> > This extra usage could cause things to move into tenured space. Could
>> this
>> > be causing your problem?
>> >
>> > Workaround would be to cause some method to be called on the WeakHashMap.
>> > You don't want to call get(), since that will try to populate the cache.
>> > Maybe if you tried putting a small value to the cache, and doing a GC,
>> and
>> > see if your memory drops then.
>> >
>> >
>> > Tom
>> >
>> >
>> >
>> > On Mon, Dec 7, 2009 at 1:48 PM, TCK <moonwatcher32329@gmail.com> wrote:
>> >
>> >> Thanks for the response. But I'm definitely calling close() on the old
>> >> reader and opening a new one (not using reopen). Also, to simplify the
>> >> analysis, I did my test with a single-threaded requester to eliminate
>> any
>> >> concurrency issues.
>> >>
>> >> I'm doing:
>> >> sSearcher.getIndexReader().close();
>> >> sSearcher.close(); // this actually seems to be a no-op
>> >> IndexReader newIndexReader = IndexReader.open(newDirectory);
>> >> sSearcher = new IndexSearcher(newIndexReader);
>> >>
>> >> Btw, isn't it bad practice anyway to have an unbounded cache? Are there
>> any
>> >> plans to replace the HashMaps used for the innerCaches with an actual
>> >> size-bounded cache with some eviction policy (perhaps EhCache or
>> something)
>> >> ?
>> >>
>> >> Thanks again,
>> >> TCK
>> >>
>> >>
>> >>
>> >>
>> >> On Mon, Dec 7, 2009 at 4:37 PM, Erick Erickson <erickerickson@gmail.com
>> >> >wrote:
>> >>
>> >> > What this sounds like is that you're not really closing your
>> >> > readers even though you think you are. Sorting indeed uses up
>> >> > significant memory when it populates internal caches and keeps
>> >> > it around for later use (which is one of the reasons that warming
>> >> > queries matter). But if you really do close the reader, I'm pretty
>> >> > sure the memory should be GC-able.
>> >> >
>> >> > One thing that trips people up is IndexReader.reopen(). If it
>> >> > returns a reader different than the original, you *must* close the
>> >> > old one. If you don't, the old reader is still hanging around and
>> >> > memory won't be returne.... An example from the Javadocs...
>> >> >
>> >> >  IndexReader reader = ...
>> >> >  ...
>> >> >  IndexReader new = r.reopen();
>> >> >  if (new != reader) {
>> >> >   ...     // reader was reopened
>> >> >   reader.close();
>> >> >  }
>> >> >  reader = new;
>> >> >  ...
>> >> >
>> >> >
>> >> > If this is irrelevant, could you post your close/open
>> >> >
>> >> > code?
>> >> >
>> >> > HTH
>> >> >
>> >> > Erick
>> >> >
>> >> >
>> >> > On Mon, Dec 7, 2009 at 4:27 PM, TCK <moonwatcher32329@gmail.com>
>> wrote:
>> >> >
>> >> > > Hi,
>> >> > > I'm having heap memory issues when I do lucene queries involving
>> >> sorting
>> >> > by
>> >> > > a string field. Such queries seem to load a lot of data in to
the
>> heap.
>> >> > > Moreover lucene seems to hold on to references to this data even
>> after
>> >> > the
>> >> > > index reader has been closed and a full GC has been run. Some
of the
>> >> > > consequences of this are that in my generational heap configuration
>> a
>> >> lot
>> >> > > of
>> >> > > memory gets promoted to tenured space each time I close the old
>> index
>> >> > > reader
>> >> > > and after opening and querying using a new one, and the tenured
>> space
>> >> > > eventually gets fragmented causing a lot of promotion failures
>> >> resulting
>> >> > in
>> >> > > jvm hangs while the jvm does stop-the-world GCs.
>> >> > >
>> >> > > Does anyone know any workarounds to avoid these memory issues
when
>> >> doing
>> >> > > such lucene queries?
>> >> > >
>> >> > > My profiling showed that even after a full GC lucene is holding
on
>> to a
>> >> > lot
>> >> > > of references to field value data notably via the
>> >> > > FieldCacheImpl/ExtendedFieldCacheImpl. I noticed that the
>> WeakHashMap
>> >> > > readerCaches are using unbounded HashMaps as the innerCaches and
I
>> used
>> >> > > reflection to replace these innerCaches with dummy empty HashMaps,
>> but
>> >> > > still
>> >> > > I'm seeing the same behavior. I wondered if anyone has gone through
>> >> these
>> >> > > same issues before and would offer any advice.
>> >> > >
>> >> > > Thanks a lot,
>> >> > > TCK
>> >> > >
>> >> >
>> >>
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message