lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mathieu Lecarme <math...@garambrogne.net>
Subject Re: MoreLikeThis for multiple documents
Date Thu, 26 Jul 2007 13:20:13 GMT
Jens Grivolla a écrit :
> Hello,
>
> I'm looking to extract significant terms characterizing a set of
> documents (which in turn relate to a topic).
>
> This basically comes down to functionality similar to determining the
> terms with the greatest offer weight (as used for blind relevance
> feedback), or maximizing tf.idf (as is done in MoreLikeThis).
>
> Is there anything like this already implemented, or do I need to
> iterate through all documents in the set "manually", re-tokenize each
> one (or maybe use TermVectors), and then calculate the weight for each
> term? 
http://project.carrot2.org/index.html may be your friend.

M.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message