lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jens Grivolla <>
Subject MoreLikeThis for multiple documents
Date Wed, 25 Jul 2007 19:04:31 GMT

I'm looking to extract significant terms characterizing a set of 
documents (which in turn relate to a topic).

This basically comes down to functionality similar to determining the 
terms with the greatest offer weight (as used for blind relevance 
feedback), or maximizing tf.idf (as is done in MoreLikeThis).

Is there anything like this already implemented, or do I need to iterate 
through all documents in the set "manually", re-tokenize each one (or 
maybe use TermVectors), and then calculate the weight for each term?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message