lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhang, Lisheng" <>
Subject RE: Lucene memory usage
Date Wed, 10 Jun 2009 17:30:07 GMT

Does this issue has anything to do with the line:

> TopScoreDocCollector collector = new TopScoreDocCollector(100000);

if we do:

> TopScoreDocCollector collector = new TopScoreDocCollector(2);

instead (only see top two documents), could memory usage be less?

Best regards, Lisheng

-----Original Message-----
From: Michael McCandless []
Sent: Wednesday, June 10, 2009 5:40 AM
Subject: Re: Lucene memory usage

This (very large number of unique terms) is a problem for Lucene currently.

There are some simple improvements we could make to the terms dict
format to not require so much RAM per term in the terms index...
LUCENE-1458 (flexible indexing) has these improvements, but
unfortunately tied in w/ lots of other changes.  Maybe we should break
out a separate issue for this... this'd be a great contained
improvement, if anyone out there has "the itch" :)

One simple workaround is to call IndexReader.setTermIndexInterval
immediately after opening the reader; this simply loads fewer terms in
the index, using far less RAM, but at the expense of somewhat slower

Also: you should peek at your index, eg using Luke, to understand why
you have so many terms.  It could be legitimate (indexing a massive
catalog with eg part numbers), or, it could be your document filtering
/ analyzer are accidentally producing garbage terms.


On Wed, Jun 10, 2009 at 8:23 AM, Benedikt Boss<> wrote:
> Hej hej,
> i have a question regarding lucenes memory usage
> when launching a query. When i execute my query
> lucene eats up over 1gig of heap-memory even
> when my result-set is only a single hit. I
> found out that this is due to the "ensureIndexIsRead()"
> method-call in the "TermInfosReader" class, which
> iterates over all Terms found in the index and saves
> them (including all value-strings) in a Term-Array.
> Is it possible to not read all that stuff
> into memory at all?
> Im doing the query like in the following pseudo-code:
> ------------------------------------------------------------------------
> TopScoreDocCollector collector = new TopScoreDocCollector(100000);
> QueryParser   parser= new QueryParser(field, new WhitespaceAnalyzer() );
> Directory     fsDir = new FSDirectory(indexDir, null);
> IndexSearcher is    = new IndexSearcher(fsdir);
> Query         query = parser.parse(q);
>, collector);
> ScoreDoc[] hits = collector.topDocs();
> ....... < iterate over hits and print results >
> Thanks in advance
> Benedikt
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message