lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wolfgang Hoschek <>
Subject Re: Optimizing/minimizing memory usage of memory-based indexes
Date Sat, 11 Feb 2006 03:49:01 GMT
Hi Tatu,

I take it that simply maintaining the frequencies in a hashmap  
similar to  
isn't sufficient for your usecases?
In the latter case, are you using or  


On Feb 10, 2006, at 12:29 PM, Tatu Saloranta wrote:

> I am building a simple classifier system, using Lucene
> essentially to efficiently+incrementally calculate
> term frequencies.
> (due to input variations, I am currently creating a
> separate index for each attribute, although I guess I
> could (should?) just use different field for each
> attribute)
> Now, one potential problem I have is that although
> memory usage is probably sub-linear (I just index
> terms, don't store; vocabulary grows sub-linearly),
> and thus actual memory used should not grow too fast,
> the way Lucene builds and merges indexes fluctuates: I
> assume memory usage mostly changes when merging
> segments. I have simple diagnostics for memory usage
> that force gc every 1000 documents processed [yes, I
> know that System.gc() does not strictly guarantee it,
> but in practice it is good enough], and notice usage
> fluctuating it a bit, with overall increase. but 10%
> drop every 12000 documents or so, with default
> settings).
> So... I am essentially wondering if there are good
> techniques for tuning memory usage (minimize index
> structure size) adaptively, to avoid running out of
> memory, in cases where compacting the index would
> avoid out of mem case.
> Further, are there possibilities to perhaps trade
> reduced memory usage for slightly slower indexing? (or
> even better, searching -- in my case, I only traverse
> term indexes to get counts). IndexWriter.optimize()
> probably does not really help here does it?
> -+ Tatu +-
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message