lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike O'Leary" <tmole...@uw.edu>
Subject Obtaining IDF values for the terms in a document set
Date Thu, 15 Dec 2011 17:33:49 GMT
We have a large set of documents that we would like to index with a customized stopword list.
We have run tests by indexing a random set of about 10% of the documents, and we'd like to
generate a list of the terms in that smaller set and their IDF values as a way to create a
starter set of stopwords for the larger document set by selecting the terms that have the
lowest IDF values. First of all, is this the best way to create a stopword list? Second, is
there a straightforward way to generate a list of terms and their IDF values from a Lucene
index?
Thanks,
Mike

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message