lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Rutherglen <jason.rutherg...@gmail.com>
Subject Re: Lucene memory usage
Date Wed, 10 Jun 2009 18:02:54 GMT
> LUCENE-1458 (flexible indexing) has these improvements,

Mike, can you explain how it's different?  I looked through the code once
but yeah, it's in with a lot of other changes.

On Wed, Jun 10, 2009 at 5:40 AM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> This (very large number of unique terms) is a problem for Lucene currently.
>
> There are some simple improvements we could make to the terms dict
> format to not require so much RAM per term in the terms index...
> LUCENE-1458 (flexible indexing) has these improvements, but
> unfortunately tied in w/ lots of other changes.  Maybe we should break
> out a separate issue for this... this'd be a great contained
> improvement, if anyone out there has "the itch" :)
>
> One simple workaround is to call IndexReader.setTermIndexInterval
> immediately after opening the reader; this simply loads fewer terms in
> the index, using far less RAM, but at the expense of somewhat slower
> searching.
>
> Also: you should peek at your index, eg using Luke, to understand why
> you have so many terms.  It could be legitimate (indexing a massive
> catalog with eg part numbers), or, it could be your document filtering
> / analyzer are accidentally producing garbage terms.
>
> Mike
>
> On Wed, Jun 10, 2009 at 8:23 AM, Benedikt Boss<nanoc@web.de> wrote:
> > Hej hej,
> >
> > i have a question regarding lucenes memory usage
> > when launching a query. When i execute my query
> > lucene eats up over 1gig of heap-memory even
> > when my result-set is only a single hit. I
> > found out that this is due to the "ensureIndexIsRead()"
> > method-call in the "TermInfosReader" class, which
> > iterates over all Terms found in the index and saves
> > them (including all value-strings) in a Term-Array.
> > Is it possible to not read all that stuff
> > into memory at all?
> >
> > Im doing the query like in the following pseudo-code:
> > ------------------------------------------------------------------------
> >
> > TopScoreDocCollector collector = new TopScoreDocCollector(100000);
> >
> > QueryParser   parser= new QueryParser(field, new WhitespaceAnalyzer() );
> > Directory     fsDir = new FSDirectory(indexDir, null);
> > IndexSearcher is    = new IndexSearcher(fsdir);
> >
> > Query         query = parser.parse(q);
> >
> > is.search(query, collector);
> > ScoreDoc[] hits = collector.topDocs();
> >
> > ....... < iterate over hits and print results >
> >
> >
> > Thanks in advance
> > Benedikt
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message