lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: MoreLikeThis for multiple documents
Date Thu, 26 Jul 2007 15:23:00 GMT
I have some sample code for doing relevance feedback across multiple  
documents at http://www.cnlp.org/apachecon2005

It could be modified to provide more of the MoreLikeThis  
functionality (i.e. determining important terms via tf/idf) for now  
it just takes the top X terms

-Grant

On Jul 25, 2007, at 3:04 PM, Jens Grivolla wrote:

> Hello,
>
> I'm looking to extract significant terms characterizing a set of  
> documents (which in turn relate to a topic).
>
> This basically comes down to functionality similar to determining  
> the terms with the greatest offer weight (as used for blind  
> relevance feedback), or maximizing tf.idf (as is done in  
> MoreLikeThis).
>
> Is there anything like this already implemented, or do I need to  
> iterate through all documents in the set "manually", re-tokenize  
> each one (or maybe use TermVectors), and then calculate the weight  
> for each term?
>
> Thanks,
>    Jens
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message