lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ganesh" <emailg...@yahoo.co.in>
Subject Re: Term Frequency vector consumes memory
Date Wed, 01 Jul 2009 05:39:11 GMT
Thanks for your reply.

My requirement is to fetch the list of top frequency terms indexed in a day. I used the logic
said in the article (refer below link)
http://stackoverflow.com/questions/195434/how-can-i-get-top-terms-for-a-subset-of-documents-in-a-lucene-index

I enabled term vector for a field. Indexed the content and i am able to retrieve the list
of top indexed term in a day / date range.

When IndexReader/ Searcher is opened, whether it will load all term vector frequncies? 

Consider i have enabled this option and indexed say 5GB, Now i don't  want the Reader / Searcher
to load term vector. I want to switch off  
 this feature? Is that possible without re-indexing?

Regards
Ganesh

----- Original Message ----- 
From: "Grant Ingersoll" <gsingers@apache.org>
To: <java-user@lucene.apache.org>
Sent: Tuesday, June 30, 2009 9:48 PM
Subject: Re: Term Frequency vector consumes memory


> In Lucene, a Term Vector is a specific thing that is stored on disk  
> when creating a Document and Field.  It is optional and off by  
> default.  It is separate from being able to get the term frequencies  
> for all the docs in a specific field.  The former is decided at  
> indexing time and there is no way to remove it w/o reindexing.   
> Furthermore, it is not loaded into memory by the IndexReader.  Term  
> Frequencies are accessed via the TermDocs.
> 
> Can you clarify a bit more what you are looking to do?  Perhaps some  
> sample code will help demonstrate what you'd like to turn off, as I am  
> not clear on your question.
> 
> Cheers,
> Grant
> 
> On Jun 30, 2009, at 3:37 AM, Ganesh wrote:
> 
>> At the end of the day, I used to build the stats of top indexed  
>> terms. I enabled term frequency for the single field. It is working  
>> fine. I could able to get the top terms and its frequencies. It  
>> consumes huge amount of RAM. My index size is 5 GB and has 8 million  
>> records. If i didn't enable term vector then i could do index up to  
>> 17 GB with 40 million records.
>>
>> When IndexReader/ Searcher is opened, whether it will load all term  
>> vector frequncies?
>>
>> Consider i have enabled this option and indexed say 5GB, Now i don't  
>> want the Reader / Searcher to load term vector. I want to switch off  
>> this feature? Is that possible without re-indexing?
>>
>> Regards
>> Ganesh
>> Send instant messages to your online friends http://in.messenger.yahoo.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
> 
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
> using Solr/Lucene:
> http://www.lucidimagination.com/search
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
Send instant messages to your online friends http://in.messenger.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message