lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mihai Caraman <caraman.mi...@gmail.com>
Subject HighFreqTerms for results set
Date Mon, 18 Jul 2011 12:33:56 GMT
So I looked around and found no viable solution for this problem:
How to extract the most frequent terms in the search result set after
submitting the query.

HighFreqTerms
<http://lucene.apache.org/java/3_2_0/api/contrib-misc/index.html>and docFreq
<http://lucene.apache.org/java/3_2_0/api/core/org/apache/lucene/index/FilterIndexReader.html#docFreq%28org.apache.lucene.index.Term%29>don't
do the job for specific documents.

- is it plausible to make a vector of resulted docID's and intersect it with
each term's posting list in the index? bigger intersection meaning higher
frequency.
  *because search results could be really custom, this method can't be
optimize to intersect only the highest frequency terms for the entire index.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message