lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: Term Frequency vector consumes memory
Date Tue, 30 Jun 2009 16:18:18 GMT
In Lucene, a Term Vector is a specific thing that is stored on disk  
when creating a Document and Field.  It is optional and off by  
default.  It is separate from being able to get the term frequencies  
for all the docs in a specific field.  The former is decided at  
indexing time and there is no way to remove it w/o reindexing.   
Furthermore, it is not loaded into memory by the IndexReader.  Term  
Frequencies are accessed via the TermDocs.

Can you clarify a bit more what you are looking to do?  Perhaps some  
sample code will help demonstrate what you'd like to turn off, as I am  
not clear on your question.


On Jun 30, 2009, at 3:37 AM, Ganesh wrote:

> At the end of the day, I used to build the stats of top indexed  
> terms. I enabled term frequency for the single field. It is working  
> fine. I could able to get the top terms and its frequencies. It  
> consumes huge amount of RAM. My index size is 5 GB and has 8 million  
> records. If i didn't enable term vector then i could do index up to  
> 17 GB with 40 million records.
> When IndexReader/ Searcher is opened, whether it will load all term  
> vector frequncies?
> Consider i have enabled this option and indexed say 5GB, Now i don't  
> want the Reader / Searcher to load term vector. I want to switch off  
> this feature? Is that possible without re-indexing?
> Regards
> Ganesh
> Send instant messages to your online friends
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Grant Ingersoll

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message