lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Lucene memory usage
Date Wed, 10 Jun 2009 17:52:57 GMT
Asking for top 100K docs will certainly consume more RAM than asking
for top 2, but much less than 1 GB.

More like maybe an added ~2-3 MB or so.

Mike

On Wed, Jun 10, 2009 at 1:30 PM, Zhang,
Lisheng<Lisheng.Zhang@broadvision.com> wrote:
> Hi,
>
> Does this issue has anything to do with the line:
>
>> TopScoreDocCollector collector = new TopScoreDocCollector(100000);
>
> if we do:
>
>> TopScoreDocCollector collector = new TopScoreDocCollector(2);
>
> instead (only see top two documents), could memory usage be less?
>
> Best regards, Lisheng
>
> -----Original Message-----
> From: Michael McCandless [mailto:lucene@mikemccandless.com]
> Sent: Wednesday, June 10, 2009 5:40 AM
> To: java-user@lucene.apache.org
> Subject: Re: Lucene memory usage
>
>
> This (very large number of unique terms) is a problem for Lucene currently.
>
> There are some simple improvements we could make to the terms dict
> format to not require so much RAM per term in the terms index...
> LUCENE-1458 (flexible indexing) has these improvements, but
> unfortunately tied in w/ lots of other changes.  Maybe we should break
> out a separate issue for this... this'd be a great contained
> improvement, if anyone out there has "the itch" :)
>
> One simple workaround is to call IndexReader.setTermIndexInterval
> immediately after opening the reader; this simply loads fewer terms in
> the index, using far less RAM, but at the expense of somewhat slower
> searching.
>
> Also: you should peek at your index, eg using Luke, to understand why
> you have so many terms.  It could be legitimate (indexing a massive
> catalog with eg part numbers), or, it could be your document filtering
> / analyzer are accidentally producing garbage terms.
>
> Mike
>
> On Wed, Jun 10, 2009 at 8:23 AM, Benedikt Boss<nanoc@web.de> wrote:
>> Hej hej,
>>
>> i have a question regarding lucenes memory usage
>> when launching a query. When i execute my query
>> lucene eats up over 1gig of heap-memory even
>> when my result-set is only a single hit. I
>> found out that this is due to the "ensureIndexIsRead()"
>> method-call in the "TermInfosReader" class, which
>> iterates over all Terms found in the index and saves
>> them (including all value-strings) in a Term-Array.
>> Is it possible to not read all that stuff
>> into memory at all?
>>
>> Im doing the query like in the following pseudo-code:
>> ------------------------------------------------------------------------
>>
>> TopScoreDocCollector collector = new TopScoreDocCollector(100000);
>>
>> QueryParser   parser= new QueryParser(field, new WhitespaceAnalyzer() );
>> Directory     fsDir = new FSDirectory(indexDir, null);
>> IndexSearcher is    = new IndexSearcher(fsdir);
>>
>> Query         query = parser.parse(q);
>>
>> is.search(query, collector);
>> ScoreDoc[] hits = collector.topDocs();
>>
>> ....... < iterate over hits and print results >
>>
>>
>> Thanks in advance
>> Benedikt
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message