lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ganesh" <emailg...@yahoo.co.in>
Subject Re: Most frequently indexed term
Date Mon, 08 Jun 2009 10:30:32 GMT
Thanks. This works well.

The logic is 
1. Do the search, For every document get the list of terms and its frequency. 
2. Use SortedTermVectorMapper to generate a list of unique terms and its frequency. 
2. Sort them to get the list of top numbered frequently indexed terms in a given date range
(any given criteria).

My Question is:
I need to get the top 20 highly indexed term in a day. 1 million documents could be indexed
in a day. I need to traverse the 1 million records and store the unique terms and its frequencies.
It may consume huge amount of memory. Is there any other way out? With out using term vector,
i could get the list of most frequently indexed term in a database. Similarly is there any
other way to get the list of most frequently indexed term in a date range or a subset of database.

Regards
Ganesh 



----- Original Message ----- 
From: "Preetham Kajekar" <preetham@cisco.com>
To: <java-user@lucene.apache.org>
Sent: Tuesday, May 26, 2009 11:08 PM
Subject: Re: Most frequently indexed term


> Have a look at
> http://stackoverflow.com/questions/195434/how-can-i-get-top-terms-for-a-subset-of-documents-in-a-lucene-index
> 
> (I have not tried the above out)
> 
> Ganesh wrote:
>> Hello All,
>>
>> I need to build some stats. I need to know Top 5 frequently indexed term in a date
range (In a day or a Month).
>>
>> Any idea of how to achieve this.
>>
>> Regards
>> GaneshI݊{-j{fz-*.w'vmyǧj(com=
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>I݊{-j{fz-*.w'vm),zx
Mime
View raw message