lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Goetzke" <uwe.goet...@healy-hudson.com>
Subject AW: Most frequently indexed term
Date Mon, 08 Jun 2009 10:52:14 GMT
Hello Ganesh,

What about  making a seperate index for each day, get your analysis and merge thereafter that
index.

I am not sure but I think this might work. Use MultiSearcher for the search.

Regards

Uwe Goetzke


-----Ursprüngliche Nachricht-----
Von: Ganesh [mailto:emailgane@yahoo.co.in] 
Gesendet: Montag, 8. Juni 2009 12:31
An: java-user@lucene.apache.org
Betreff: Re: Most frequently indexed term

Thanks. This works well.

The logic is 
1. Do the search, For every document get the list of terms and its frequency. 
2. Use SortedTermVectorMapper to generate a list of unique terms and its frequency. 
2. Sort them to get the list of top numbered frequently indexed terms in a given date range
(any given criteria).

My Question is:
I need to get the top 20 highly indexed term in a day. 1 million documents could be indexed
in a day. I need to traverse the 1 million records and store the unique terms and its frequencies.
It may consume huge amount of memory. Is there any other way out? With out using term vector,
i could get the list of most frequently indexed term in a database. Similarly is there any
other way to get the list of most frequently indexed term in a date range or a subset of database.

Regards
Ganesh 



----- Original Message ----- 
From: "Preetham Kajekar" <preetham@cisco.com>
To: <java-user@lucene.apache.org>
Sent: Tuesday, May 26, 2009 11:08 PM
Subject: Re: Most frequently indexed term


> Have a look at
> http://stackoverflow.com/questions/195434/how-can-i-get-top-terms-for-a-subset-of-documents-in-a-lucene-index
> 
> (I have not tried the above out)
> 
> Ganesh wrote:
>> Hello All,
>>
>> I need to build some stats. I need to know Top 5 frequently indexed term in a date
range (In a day or a Month).
>>
>> Any idea of how to achieve this.
>>
>> Regards
>> GaneshIéÝŠ{-j{fzˁë-£*.®‰åŠwŸ®'§vÈm¶ŸÿŠyž²Ç§êòj(com=
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>IéÝŠ{-j{fzˁë-£*.®‰åŠwŸ®'§vÈm¶ŸÿŠyž²Ç§êòj(

-----------------------------------------------------------------------
Healy Hudson GmbH - D-55252 Mainz Kastel
Geschäftsführer Christian Konhäuser - Amtsgericht Wiesbaden HRB 12076

Diese Email ist vertraulich. Wenn Sie nicht der beabsichtigte Empfänger sind, dürfen Sie
die Informationen nicht offen legen oder benutzen. Wenn Sie diese Email durch einen Fehler
bekommen haben, teilen Sie uns dies bitte umgehend mit, indem Sie diese Email an den Absender
zurückschicken. Bitte löschen Sie danach diese Email.
This email is confidential. If you are not the intended recipient, you must not disclose or
use this information contained in it. If you have received this email in error please tell
us immediately by return email and delete the document.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message