lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Rutherglen <jason.rutherg...@gmail.com>
Subject Re: heap memory issues when sorting by a string field
Date Tue, 08 Dec 2009 06:51:46 GMT
TCK,

CSIndexInput is returned by SegmentReader.getFieldCacheKey()

If you think it's an issue, then it'd be good to open an issue and submit
some code as a patch, maybe a test case showing the WHM isn't
removing values like it's supposed to.

Jason

On Mon, Dec 7, 2009 at 10:45 PM, TCK <moonwatcher32329@gmail.com> wrote:
> Thanks for the feedback guys. The evidence I have collected does point to an
> issue either in the java WeakHashMap implementation or in Lucene's use of
> it. In particular, I used reflection to replace the WeakHashMap instances
> with my own dummy Map that does a no-op for the put operation, and although
> this caused tons more garbage to be created for each search query the CMS
> executions always collected that garbage and brought down the tenured space
> usage to a small value.
>
> Before that, I tried calling clear() on the WeakHashMap instances and then
> calling size() which calls expungeStaleEntries() but that didn't help. There
> may be indeed be a case of double memory usage for a while (until enough CMS
> executions have run) as Tom pointed out, but I don't fully understand what's
> happenning yet.
>
> Also, bizzarely during my introspection of the readerCache WeakHashMap
> instances I found an instance of
> org.apache.lucene.index.CompoundFileReader$CSIndexInput as a key. Looking at
> the code I don't see how anything other than an IndexReader could possibly
> get there, and this class certainly doesn't extend IndexReader. Any ideas?
> :-) Maybe I need to get more sleep.
>
> Btw, is searching with a sort by a text field not a common use-case of
> lucene? I've been testing with only 1Gb indexes and I'm pretty sure there
> are much larger indexes out there.
>
> Cheers,
> TCK
>
>
>
>
>
> On Mon, Dec 7, 2009 at 7:57 PM, Tom Hill <solr-list@worldware.com> wrote:
>
>> Hey, that's a nice little Class! I hadn't see it before. But it sounds like
>> the asynchronous cleanup might deal with the problem I mentioned above (but
>> I haven't looked at the code yet).
>>
>> It's an apache license - but you mentioned something about no third party
>> libraries. Is that a policy for Lucene?
>>
>> Thanks,
>>
>> Tom
>>
>>
>> On Mon, Dec 7, 2009 at 4:44 PM, Jason Rutherglen <
>> jason.rutherglen@gmail.com
>> > wrote:
>>
>> > I wonder if Google Collections (even though we don't use third party
>> > libraries) concurrent map, which supports weak keys, handles the
>> > removal of weakly referenced keys in a more elegant way than Java's
>> > WeakHashMap?
>> >
>> > On Mon, Dec 7, 2009 at 4:38 PM, Tom Hill <solr-list@worldware.com>
>> wrote:
>> > > Hi -
>> > >
>> > > If I understand correctly, WeakHashMap does not free the memory for the
>> > > value (cached data) when the key is nulled, or even when the key is
>> > garbage
>> > > collected.
>> > >
>> > > It requires one more step: a method on WeakHashMap must be called to
>> > allow
>> > > it to release its hard reference to the cached data. It appears that
>> most
>> > > methods in WeakHashMap end up calling expungeStaleEntries, which will
>> > clear
>> > > the hard reference. But you have to call some method on the map, before
>> > the
>> > > memory is eligible for garbage collection.
>> > >
>> > > So it requires four stages to free the cached data. Null the key; A GC
>> to
>> > > release the weak reference to the key; A call to some method on the
>> map;
>> > > Then the next GC cycle should free the value.
>> > >
>> > > So it seems possible that you could end up with double memory usage for
>> a
>> > > time. If you don't have a GC between the time that you close the old
>> > reader,
>> > > and you start to load the field cache entry for the next reader, then
>> the
>> > > key may still be hanging around uncollected.
>> > >
>> > > At that point, it may run a GC when you allocate the new cache, but
>> > that's
>> > > only the first GC. It can't free the cached data until after the next
>> > call
>> > > to expungeStaleEntries, so for a while you have both caches around.
>> > >
>> > > This extra usage could cause things to move into tenured space. Could
>> > this
>> > > be causing your problem?
>> > >
>> > > Workaround would be to cause some method to be called on the
>> WeakHashMap.
>> > > You don't want to call get(), since that will try to populate the
>> cache.
>> > > Maybe if you tried putting a small value to the cache, and doing a GC,
>> > and
>> > > see if your memory drops then.
>> > >
>> > >
>> > > Tom
>> > >
>> > >
>> > >
>> > > On Mon, Dec 7, 2009 at 1:48 PM, TCK <moonwatcher32329@gmail.com>
>> wrote:
>> > >
>> > >> Thanks for the response. But I'm definitely calling close() on the
old
>> > >> reader and opening a new one (not using reopen). Also, to simplify
the
>> > >> analysis, I did my test with a single-threaded requester to eliminate
>> > any
>> > >> concurrency issues.
>> > >>
>> > >> I'm doing:
>> > >> sSearcher.getIndexReader().close();
>> > >> sSearcher.close(); // this actually seems to be a no-op
>> > >> IndexReader newIndexReader = IndexReader.open(newDirectory);
>> > >> sSearcher = new IndexSearcher(newIndexReader);
>> > >>
>> > >> Btw, isn't it bad practice anyway to have an unbounded cache? Are
>> there
>> > any
>> > >> plans to replace the HashMaps used for the innerCaches with an actual
>> > >> size-bounded cache with some eviction policy (perhaps EhCache or
>> > something)
>> > >> ?
>> > >>
>> > >> Thanks again,
>> > >> TCK
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> On Mon, Dec 7, 2009 at 4:37 PM, Erick Erickson <
>> erickerickson@gmail.com
>> > >> >wrote:
>> > >>
>> > >> > What this sounds like is that you're not really closing your
>> > >> > readers even though you think you are. Sorting indeed uses up
>> > >> > significant memory when it populates internal caches and keeps
>> > >> > it around for later use (which is one of the reasons that warming
>> > >> > queries matter). But if you really do close the reader, I'm pretty
>> > >> > sure the memory should be GC-able.
>> > >> >
>> > >> > One thing that trips people up is IndexReader.reopen(). If it
>> > >> > returns a reader different than the original, you *must* close
the
>> > >> > old one. If you don't, the old reader is still hanging around
and
>> > >> > memory won't be returne.... An example from the Javadocs...
>> > >> >
>> > >> >  IndexReader reader = ...
>> > >> >  ...
>> > >> >  IndexReader new = r.reopen();
>> > >> >  if (new != reader) {
>> > >> >   ...     // reader was reopened
>> > >> >   reader.close();
>> > >> >  }
>> > >> >  reader = new;
>> > >> >  ...
>> > >> >
>> > >> >
>> > >> > If this is irrelevant, could you post your close/open
>> > >> >
>> > >> > code?
>> > >> >
>> > >> > HTH
>> > >> >
>> > >> > Erick
>> > >> >
>> > >> >
>> > >> > On Mon, Dec 7, 2009 at 4:27 PM, TCK <moonwatcher32329@gmail.com>
>> > wrote:
>> > >> >
>> > >> > > Hi,
>> > >> > > I'm having heap memory issues when I do lucene queries involving
>> > >> sorting
>> > >> > by
>> > >> > > a string field. Such queries seem to load a lot of data in
to the
>> > heap.
>> > >> > > Moreover lucene seems to hold on to references to this data
even
>> > after
>> > >> > the
>> > >> > > index reader has been closed and a full GC has been run.
Some of
>> the
>> > >> > > consequences of this are that in my generational heap
>> configuration
>> > a
>> > >> lot
>> > >> > > of
>> > >> > > memory gets promoted to tenured space each time I close the
old
>> > index
>> > >> > > reader
>> > >> > > and after opening and querying using a new one, and the tenured
>> > space
>> > >> > > eventually gets fragmented causing a lot of promotion failures
>> > >> resulting
>> > >> > in
>> > >> > > jvm hangs while the jvm does stop-the-world GCs.
>> > >> > >
>> > >> > > Does anyone know any workarounds to avoid these memory issues
when
>> > >> doing
>> > >> > > such lucene queries?
>> > >> > >
>> > >> > > My profiling showed that even after a full GC lucene is holding
on
>> > to a
>> > >> > lot
>> > >> > > of references to field value data notably via the
>> > >> > > FieldCacheImpl/ExtendedFieldCacheImpl. I noticed that the
>> > WeakHashMap
>> > >> > > readerCaches are using unbounded HashMaps as the innerCaches
and I
>> > used
>> > >> > > reflection to replace these innerCaches with dummy empty
HashMaps,
>> > but
>> > >> > > still
>> > >> > > I'm seeing the same behavior. I wondered if anyone has gone
>> through
>> > >> these
>> > >> > > same issues before and would offer any advice.
>> > >> > >
>> > >> > > Thanks a lot,
>> > >> > > TCK
>> > >> > >
>> > >> >
>> > >>
>> > >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>> >
>> >
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message