lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wolfgang Hoschek <wolfgang.hosc...@mac.com>
Subject Re: Optimizing/minimizing memory usage of memory-based indexes
Date Sat, 11 Feb 2006 03:49:01 GMT
Hi Tatu,

I take it that simply maintaining the frequencies in a hashmap  
similar to  
org.apache.lucene.index.memory.AnalyzerUtil.getMostFrequentTerms()  
isn't sufficient for your usecases?
In the latter case, are you using  
org.apache.lucene.store.RAMDirectory or  
org.apache.lucene.index.memory.MemoryIndex?

Wolfgang.

On Feb 10, 2006, at 12:29 PM, Tatu Saloranta wrote:

> I am building a simple classifier system, using Lucene
> essentially to efficiently+incrementally calculate
> term frequencies.
> (due to input variations, I am currently creating a
> separate index for each attribute, although I guess I
> could (should?) just use different field for each
> attribute)
>
> Now, one potential problem I have is that although
> memory usage is probably sub-linear (I just index
> terms, don't store; vocabulary grows sub-linearly),
> and thus actual memory used should not grow too fast,
> the way Lucene builds and merges indexes fluctuates: I
> assume memory usage mostly changes when merging
> segments. I have simple diagnostics for memory usage
> that force gc every 1000 documents processed [yes, I
> know that System.gc() does not strictly guarantee it,
> but in practice it is good enough], and notice usage
> fluctuating it a bit, with overall increase. but 10%
> drop every 12000 documents or so, with default
> settings).
>
> So... I am essentially wondering if there are good
> techniques for tuning memory usage (minimize index
> structure size) adaptively, to avoid running out of
> memory, in cases where compacting the index would
> avoid out of mem case.
>
> Further, are there possibilities to perhaps trade
> reduced memory usage for slightly slower indexing? (or
> even better, searching -- in my case, I only traverse
> term indexes to get counts). IndexWriter.optimize()
> probably does not really help here does it?
>
> -+ Tatu +-
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message