lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tatu Saloranta <>
Subject Optimizing/minimizing memory usage of memory-based indexes
Date Fri, 10 Feb 2006 20:29:16 GMT
I am building a simple classifier system, using Lucene
essentially to efficiently+incrementally calculate
term frequencies.
(due to input variations, I am currently creating a
separate index for each attribute, although I guess I
could (should?) just use different field for each

Now, one potential problem I have is that although
memory usage is probably sub-linear (I just index
terms, don't store; vocabulary grows sub-linearly),
and thus actual memory used should not grow too fast,
the way Lucene builds and merges indexes fluctuates: I
assume memory usage mostly changes when merging
segments. I have simple diagnostics for memory usage
that force gc every 1000 documents processed [yes, I
know that System.gc() does not strictly guarantee it,
but in practice it is good enough], and notice usage
fluctuating it a bit, with overall increase. but 10%
drop every 12000 documents or so, with default

So... I am essentially wondering if there are good
techniques for tuning memory usage (minimize index
structure size) adaptively, to avoid running out of
memory, in cases where compacting the index would
avoid out of mem case.

Further, are there possibilities to perhaps trade
reduced memory usage for slightly slower indexing? (or
even better, searching -- in my case, I only traverse
term indexes to get counts). IndexWriter.optimize()
probably does not really help here does it?

-+ Tatu +-

Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message