lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Term Frequency vector consumes memory
Date Thu, 02 Jul 2009 12:45:53 GMT

On Jul 1, 2009, at 1:39 AM, Ganesh wrote:

> Thanks for your reply.
>
> My requirement is to fetch the list of top frequency terms indexed  
> in a day. I used the logic said in the article (refer below link)
> http://stackoverflow.com/questions/195434/how-can-i-get-top-terms-for-a-subset-of-documents-in-a-lucene-index
>
> I enabled term vector for a field. Indexed the content and i am able  
> to retrieve the list of top indexed term in a day / date range.
>
> When IndexReader/ Searcher is opened, whether it will load all term  
> vector frequncies?

No, it won't. Term Vecs are stored on disk much like the stored fields.

>
> Consider i have enabled this option and indexed say 5GB, Now i  
> don't  want the Reader / Searcher to load term vector. I want to  
> switch off
> this feature? Is that possible without re-indexing?

I suppose.  Although the approach you are using seems to rely on a  
custom Collector, which means you need to not use that one.

Storing Term Vecs will indeed make your index much bigger, but it  
shouldn't effect memory much, unless you are caching, which probably  
isn't a bad idea anyway.



>
> Regards
> Ganesh
>
> ----- Original Message -----
> From: "Grant Ingersoll" <gsingers@apache.org>
> To: <java-user@lucene.apache.org>
> Sent: Tuesday, June 30, 2009 9:48 PM
> Subject: Re: Term Frequency vector consumes memory
>
>
>> In Lucene, a Term Vector is a specific thing that is stored on disk
>> when creating a Document and Field.  It is optional and off by
>> default.  It is separate from being able to get the term frequencies
>> for all the docs in a specific field.  The former is decided at
>> indexing time and there is no way to remove it w/o reindexing.
>> Furthermore, it is not loaded into memory by the IndexReader.  Term
>> Frequencies are accessed via the TermDocs.
>>
>> Can you clarify a bit more what you are looking to do?  Perhaps some
>> sample code will help demonstrate what you'd like to turn off, as I  
>> am
>> not clear on your question.
>>
>> Cheers,
>> Grant
>>
>> On Jun 30, 2009, at 3:37 AM, Ganesh wrote:
>>
>>> At the end of the day, I used to build the stats of top indexed
>>> terms. I enabled term frequency for the single field. It is working
>>> fine. I could able to get the top terms and its frequencies. It
>>> consumes huge amount of RAM. My index size is 5 GB and has 8 million
>>> records. If i didn't enable term vector then i could do index up to
>>> 17 GB with 40 million records.
>>>
>>> When IndexReader/ Searcher is opened, whether it will load all term
>>> vector frequncies?
>>>
>>> Consider i have enabled this option and indexed say 5GB, Now i don't
>>> want the Reader / Searcher to load term vector. I want to switch off
>>> this feature? Is that possible without re-indexing?
>>>
>>> Regards
>>> Ganesh
>>> Send instant messages to your online friends http://in.messenger.yahoo.com
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>> --------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>>
>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
>> using Solr/Lucene:
>> http://www.lucidimagination.com/search
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
> Send instant messages to your online friends http://in.messenger.yahoo.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message