lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Israel Tsadok <itsa...@gmail.com>
Subject Re: HighFreqTerms for results set
Date Tue, 19 Jul 2011 07:39:32 GMT
We faced this problem a long time ago, and ended up just extracting all the
matching documents, re-analyzing and counting the terms using a
MultiSet<http://guava-libraries.googlecode.com/svn/tags/release02/javadoc/com/google/common/collect/Multiset.html>.
It was very slow, but it worked.
You might get better perfomance if you index with term vectors and use
IndexReader#getTermFreqVector()<http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/index/IndexReader.html#getTermFreqVector(int,
java.lang.String)> to count the number of appearances of each term in a
document.

Either way, I don't know of a fast way to do this.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message