lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Rutherglen <jason.rutherg...@gmail.com>
Subject Re: heap memory issues when sorting by a string field
Date Tue, 08 Dec 2009 00:44:41 GMT
I wonder if Google Collections (even though we don't use third party
libraries) concurrent map, which supports weak keys, handles the
removal of weakly referenced keys in a more elegant way than Java's
WeakHashMap?

On Mon, Dec 7, 2009 at 4:38 PM, Tom Hill <solr-list@worldware.com> wrote:
> Hi -
>
> If I understand correctly, WeakHashMap does not free the memory for the
> value (cached data) when the key is nulled, or even when the key is garbage
> collected.
>
> It requires one more step: a method on WeakHashMap must be called to allow
> it to release its hard reference to the cached data. It appears that most
> methods in WeakHashMap end up calling expungeStaleEntries, which will clear
> the hard reference. But you have to call some method on the map, before the
> memory is eligible for garbage collection.
>
> So it requires four stages to free the cached data. Null the key; A GC to
> release the weak reference to the key; A call to some method on the map;
> Then the next GC cycle should free the value.
>
> So it seems possible that you could end up with double memory usage for a
> time. If you don't have a GC between the time that you close the old reader,
> and you start to load the field cache entry for the next reader, then the
> key may still be hanging around uncollected.
>
> At that point, it may run a GC when you allocate the new cache, but that's
> only the first GC. It can't free the cached data until after the next call
> to expungeStaleEntries, so for a while you have both caches around.
>
> This extra usage could cause things to move into tenured space. Could this
> be causing your problem?
>
> Workaround would be to cause some method to be called on the WeakHashMap.
> You don't want to call get(), since that will try to populate the cache.
> Maybe if you tried putting a small value to the cache, and doing a GC, and
> see if your memory drops then.
>
>
> Tom
>
>
>
> On Mon, Dec 7, 2009 at 1:48 PM, TCK <moonwatcher32329@gmail.com> wrote:
>
>> Thanks for the response. But I'm definitely calling close() on the old
>> reader and opening a new one (not using reopen). Also, to simplify the
>> analysis, I did my test with a single-threaded requester to eliminate any
>> concurrency issues.
>>
>> I'm doing:
>> sSearcher.getIndexReader().close();
>> sSearcher.close(); // this actually seems to be a no-op
>> IndexReader newIndexReader = IndexReader.open(newDirectory);
>> sSearcher = new IndexSearcher(newIndexReader);
>>
>> Btw, isn't it bad practice anyway to have an unbounded cache? Are there any
>> plans to replace the HashMaps used for the innerCaches with an actual
>> size-bounded cache with some eviction policy (perhaps EhCache or something)
>> ?
>>
>> Thanks again,
>> TCK
>>
>>
>>
>>
>> On Mon, Dec 7, 2009 at 4:37 PM, Erick Erickson <erickerickson@gmail.com
>> >wrote:
>>
>> > What this sounds like is that you're not really closing your
>> > readers even though you think you are. Sorting indeed uses up
>> > significant memory when it populates internal caches and keeps
>> > it around for later use (which is one of the reasons that warming
>> > queries matter). But if you really do close the reader, I'm pretty
>> > sure the memory should be GC-able.
>> >
>> > One thing that trips people up is IndexReader.reopen(). If it
>> > returns a reader different than the original, you *must* close the
>> > old one. If you don't, the old reader is still hanging around and
>> > memory won't be returne.... An example from the Javadocs...
>> >
>> >  IndexReader reader = ...
>> >  ...
>> >  IndexReader new = r.reopen();
>> >  if (new != reader) {
>> >   ...     // reader was reopened
>> >   reader.close();
>> >  }
>> >  reader = new;
>> >  ...
>> >
>> >
>> > If this is irrelevant, could you post your close/open
>> >
>> > code?
>> >
>> > HTH
>> >
>> > Erick
>> >
>> >
>> > On Mon, Dec 7, 2009 at 4:27 PM, TCK <moonwatcher32329@gmail.com> wrote:
>> >
>> > > Hi,
>> > > I'm having heap memory issues when I do lucene queries involving
>> sorting
>> > by
>> > > a string field. Such queries seem to load a lot of data in to the heap.
>> > > Moreover lucene seems to hold on to references to this data even after
>> > the
>> > > index reader has been closed and a full GC has been run. Some of the
>> > > consequences of this are that in my generational heap configuration a
>> lot
>> > > of
>> > > memory gets promoted to tenured space each time I close the old index
>> > > reader
>> > > and after opening and querying using a new one, and the tenured space
>> > > eventually gets fragmented causing a lot of promotion failures
>> resulting
>> > in
>> > > jvm hangs while the jvm does stop-the-world GCs.
>> > >
>> > > Does anyone know any workarounds to avoid these memory issues when
>> doing
>> > > such lucene queries?
>> > >
>> > > My profiling showed that even after a full GC lucene is holding on to a
>> > lot
>> > > of references to field value data notably via the
>> > > FieldCacheImpl/ExtendedFieldCacheImpl. I noticed that the WeakHashMap
>> > > readerCaches are using unbounded HashMaps as the innerCaches and I used
>> > > reflection to replace these innerCaches with dummy empty HashMaps, but
>> > > still
>> > > I'm seeing the same behavior. I wondered if anyone has gone through
>> these
>> > > same issues before and would offer any advice.
>> > >
>> > > Thanks a lot,
>> > > TCK
>> > >
>> >
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message