lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Burton-West, Tom" <tburt...@umich.edu>
Subject RE: High frequency term for the searched query
Date Thu, 04 Nov 2010 15:40:00 GMT
Can you give more details about what you want?  Perhaps with an example?
Do you want the number of documents containing the query term, the number of occurrences of
the query term within a document, or the number of occurrences of the term in the entire index?

You can use an explain query to get information on the number of occurrences within each document
and the number of documents within the index  searcher.explain(query, doc)

If you want the number of occurrences of the term in the entire index, you can use 
org/apache/lucene/misc/GetTermInfo.java. You can give it a term and it will look up the total
number of documents containing the term and the total number of occurrences of the term in
the index.  


http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/contrib/misc/src/java/org/apache/lucene/misc/GetTermInfo.java?revision=957522&view=markup

Tom

Tom Burton-West
http://www.hathitrust.org/blogs/large-scale-search

-----Original Message-----
From: starz10de [mailto:farag_ahmed@yahoo.com] 
Sent: Thursday, November 04, 2010 3:54 AM
To: java-user@lucene.apache.org
Subject: High frequency term for the searched query


I need to find the most frequent terms that are appeared with a query. 

HighFreqTerms.java can be used only to obtain the high frequency terms in
the whole index. 

I need just to find the high frequency terms to the submitted query. 

What I do now is:

I search the index with the query and retrieve the relevant documents then
save those documents in a new folder then index them. At the end I use
HighFreqTerms.java in the new index so I can find the most frequent terms to
the query. However, this is very slow and need long time to run.

Any idea how I can do this task efficiently 


Thanks in advance

-- 
View this message in context: http://lucene.472066.n3.nabble.com/High-frequency-term-for-the-searched-query-tp1839942p1839942.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message