Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: neutral (herse.apache.org: local policy)
Mime-Version: 1.0 (Apple Message framework v752.3)
In-Reply-To: <46A79EBF.4070500@grivolla.net>
References: <46A79EBF.4070500@grivolla.net>
Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
Message-Id: <D79448B5-8134-4560-A5F4-FE2EBA810EDF@apache.org>
Content-Transfer-Encoding: 7bit
From: Grant Ingersoll <gsingers@apache.org>
Subject: Re: MoreLikeThis for multiple documents
Date: Thu, 26 Jul 2007 11:23:00 -0400
To: java-user@lucene.apache.org

I have some sample code for doing relevance feedback across multiple  
documents at http://www.cnlp.org/apachecon2005

It could be modified to provide more of the MoreLikeThis  
functionality (i.e. determining important terms via tf/idf) for now  
it just takes the top X terms

-Grant

On Jul 25, 2007, at 3:04 PM, Jens Grivolla wrote:

> Hello,
>
> I'm looking to extract significant terms characterizing a set of  
> documents (which in turn relate to a topic).
>
> This basically comes down to functionality similar to determining  
> the terms with the greatest offer weight (as used for blind  
> relevance feedback), or maximizing tf.idf (as is done in  
> MoreLikeThis).
>
> Is there anything like this already implemented, or do I need to  
> iterate through all documents in the set "manually", re-tokenize  
> each one (or maybe use TermVectors), and then calculate the weight  
> for each term?
>
> Thanks,
>    Jens
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org