lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <DCutt...@grandcentral.com>
Subject RE: Memory Usage?
Date Mon, 12 Nov 2001 19:46:39 GMT
> From: Anders Nielsen [mailto:anders@visator.dk]
> 
> this was a big boolean query, with several prefixqueries but 
> no wildcard
> queries in the or-branches.

Well it looks like those prefixes are expanding to a lot of terms, a total
of over 40,000!  (A prefix query expands into a BooleanQuery with all the
terms matching the prefix.)

If most of these expansions are low-frequency, then a simple fix should
improve things considerably.  I've attached an optimized version of
TermQuery that will hold less memory per low-frequency term.  In particular,
if a term occurs fewer than 128 times then a 1024 byte InputStream buffer is
freed immediately.

Tell me how this works.  Please send another heap dump.

Longer term, or if lots of the expanded terms occur more than 128 times,
perhaps BooleanScorer should use a different algorithm when there are
thousands of terms.  In this case it might use less memory to construct an
array of score buckets for all documents.  If (query.termCount() * 1024) >
(12 * getMaxDoc()) then this would use less memory.  In your case, with
500,000 documents and a 40,000 term query, it's currently taking 40MB/query,
and could be done in 6MB/query.  This optimization would not be too
difficult, as it could be mostly isolated to BooleanQuery and BooleanScorer.

Doug



Mime
View raw message