lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhang, Lisheng" <Lisheng.Zh...@BroadVision.com>
Subject RE: Lucene memory usage
Date Wed, 10 Jun 2009 17:30:07 GMT
Hi,

Does this issue has anything to do with the line:

> TopScoreDocCollector collector = new TopScoreDocCollector(100000);

if we do:

> TopScoreDocCollector collector = new TopScoreDocCollector(2);

instead (only see top two documents), could memory usage be less?

Best regards, Lisheng

-----Original Message-----
From: Michael McCandless [mailto:lucene@mikemccandless.com]
Sent: Wednesday, June 10, 2009 5:40 AM
To: java-user@lucene.apache.org
Subject: Re: Lucene memory usage


This (very large number of unique terms) is a problem for Lucene currently.

There are some simple improvements we could make to the terms dict
format to not require so much RAM per term in the terms index...
LUCENE-1458 (flexible indexing) has these improvements, but
unfortunately tied in w/ lots of other changes.  Maybe we should break
out a separate issue for this... this'd be a great contained
improvement, if anyone out there has "the itch" :)

One simple workaround is to call IndexReader.setTermIndexInterval
immediately after opening the reader; this simply loads fewer terms in
the index, using far less RAM, but at the expense of somewhat slower
searching.

Also: you should peek at your index, eg using Luke, to understand why
you have so many terms.  It could be legitimate (indexing a massive
catalog with eg part numbers), or, it could be your document filtering
/ analyzer are accidentally producing garbage terms.

Mike

On Wed, Jun 10, 2009 at 8:23 AM, Benedikt Boss<nanoc@web.de> wrote:
> Hej hej,
>
> i have a question regarding lucenes memory usage
> when launching a query. When i execute my query
> lucene eats up over 1gig of heap-memory even
> when my result-set is only a single hit. I
> found out that this is due to the "ensureIndexIsRead()"
> method-call in the "TermInfosReader" class, which
> iterates over all Terms found in the index and saves
> them (including all value-strings) in a Term-Array.
> Is it possible to not read all that stuff
> into memory at all?
>
> Im doing the query like in the following pseudo-code:
> ------------------------------------------------------------------------
>
> TopScoreDocCollector collector = new TopScoreDocCollector(100000);
>
> QueryParser   parser= new QueryParser(field, new WhitespaceAnalyzer() );
> Directory     fsDir = new FSDirectory(indexDir, null);
> IndexSearcher is    = new IndexSearcher(fsdir);
>
> Query         query = parser.parse(q);
>
> is.search(query, collector);
> ScoreDoc[] hits = collector.topDocs();
>
> ....... < iterate over hits and print results >
>
>
> Thanks in advance
> Benedikt
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message