lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Hill <solr-l...@worldware.com>
Subject Re: heap memory issues when sorting by a string field
Date Tue, 08 Dec 2009 00:38:14 GMT
Hi -

If I understand correctly, WeakHashMap does not free the memory for the
value (cached data) when the key is nulled, or even when the key is garbage
collected.

It requires one more step: a method on WeakHashMap must be called to allow
it to release its hard reference to the cached data. It appears that most
methods in WeakHashMap end up calling expungeStaleEntries, which will clear
the hard reference. But you have to call some method on the map, before the
memory is eligible for garbage collection.

So it requires four stages to free the cached data. Null the key; A GC to
release the weak reference to the key; A call to some method on the map;
Then the next GC cycle should free the value.

So it seems possible that you could end up with double memory usage for a
time. If you don't have a GC between the time that you close the old reader,
and you start to load the field cache entry for the next reader, then the
key may still be hanging around uncollected.

At that point, it may run a GC when you allocate the new cache, but that's
only the first GC. It can't free the cached data until after the next call
to expungeStaleEntries, so for a while you have both caches around.

This extra usage could cause things to move into tenured space. Could this
be causing your problem?

Workaround would be to cause some method to be called on the WeakHashMap.
You don't want to call get(), since that will try to populate the cache.
Maybe if you tried putting a small value to the cache, and doing a GC, and
see if your memory drops then.


Tom



On Mon, Dec 7, 2009 at 1:48 PM, TCK <moonwatcher32329@gmail.com> wrote:

> Thanks for the response. But I'm definitely calling close() on the old
> reader and opening a new one (not using reopen). Also, to simplify the
> analysis, I did my test with a single-threaded requester to eliminate any
> concurrency issues.
>
> I'm doing:
> sSearcher.getIndexReader().close();
> sSearcher.close(); // this actually seems to be a no-op
> IndexReader newIndexReader = IndexReader.open(newDirectory);
> sSearcher = new IndexSearcher(newIndexReader);
>
> Btw, isn't it bad practice anyway to have an unbounded cache? Are there any
> plans to replace the HashMaps used for the innerCaches with an actual
> size-bounded cache with some eviction policy (perhaps EhCache or something)
> ?
>
> Thanks again,
> TCK
>
>
>
>
> On Mon, Dec 7, 2009 at 4:37 PM, Erick Erickson <erickerickson@gmail.com
> >wrote:
>
> > What this sounds like is that you're not really closing your
> > readers even though you think you are. Sorting indeed uses up
> > significant memory when it populates internal caches and keeps
> > it around for later use (which is one of the reasons that warming
> > queries matter). But if you really do close the reader, I'm pretty
> > sure the memory should be GC-able.
> >
> > One thing that trips people up is IndexReader.reopen(). If it
> > returns a reader different than the original, you *must* close the
> > old one. If you don't, the old reader is still hanging around and
> > memory won't be returne.... An example from the Javadocs...
> >
> >  IndexReader reader = ...
> >  ...
> >  IndexReader new = r.reopen();
> >  if (new != reader) {
> >   ...     // reader was reopened
> >   reader.close();
> >  }
> >  reader = new;
> >  ...
> >
> >
> > If this is irrelevant, could you post your close/open
> >
> > code?
> >
> > HTH
> >
> > Erick
> >
> >
> > On Mon, Dec 7, 2009 at 4:27 PM, TCK <moonwatcher32329@gmail.com> wrote:
> >
> > > Hi,
> > > I'm having heap memory issues when I do lucene queries involving
> sorting
> > by
> > > a string field. Such queries seem to load a lot of data in to the heap.
> > > Moreover lucene seems to hold on to references to this data even after
> > the
> > > index reader has been closed and a full GC has been run. Some of the
> > > consequences of this are that in my generational heap configuration a
> lot
> > > of
> > > memory gets promoted to tenured space each time I close the old index
> > > reader
> > > and after opening and querying using a new one, and the tenured space
> > > eventually gets fragmented causing a lot of promotion failures
> resulting
> > in
> > > jvm hangs while the jvm does stop-the-world GCs.
> > >
> > > Does anyone know any workarounds to avoid these memory issues when
> doing
> > > such lucene queries?
> > >
> > > My profiling showed that even after a full GC lucene is holding on to a
> > lot
> > > of references to field value data notably via the
> > > FieldCacheImpl/ExtendedFieldCacheImpl. I noticed that the WeakHashMap
> > > readerCaches are using unbounded HashMaps as the innerCaches and I used
> > > reflection to replace these innerCaches with dummy empty HashMaps, but
> > > still
> > > I'm seeing the same behavior. I wondered if anyone has gone through
> these
> > > same issues before and would offer any advice.
> > >
> > > Thanks a lot,
> > > TCK
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message