Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 317 invoked from network); 3 Jun 2005 13:06:28 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 3 Jun 2005 13:06:28 -0000 Received: (qmail 61113 invoked by uid 500); 3 Jun 2005 13:06:18 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 60927 invoked by uid 500); 3 Jun 2005 13:06:15 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 60912 invoked by uid 99); 3 Jun 2005 13:06:15 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: local policy) Received: from gwia201.syr.edu (HELO gwia201.syr.edu) (128.230.248.25) by apache.org (qpsmtpd/0.28) with ESMTP; Fri, 03 Jun 2005 06:06:12 -0700 Received: from MTA2-MTA by gwia201.syr.edu with Novell_GroupWise; Fri, 03 Jun 2005 09:05:50 -0400 Message-Id: X-Mailer: Novell GroupWise Internet Agent 6.5.1 Date: Fri, 03 Jun 2005 09:05:20 -0400 From: "Grant Ingersoll" To: Subject: RE: calculate wi = tfi * IDFi for each document. Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Virus-Checked: Checked X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N If you can, I think there has been enough interest in the past on this, a patch that exposes the wi information would probably be useful to others (not that I am saying it would be committed, as I can't speak for the committers on the project) >>> andrew.boyd@mindspring.com 6/3/2005 8:19:16 AM >>> Thanks for the reply. It looks like I can use parts of Similarity. I'll post back once I get it working or at least closer ;-) Andrew -----Original Message----- From: Grant Ingersoll Sent: Jun 3, 2005 6:51 AM To: java-user@lucene.apache.org Subject: RE: calculate wi = tfi * IDFi for each document. I think the TermFreqVector (reader.getTermVector) has the info you want per document. You will need to sort it by frequency to get the top terms in each document. It doesn't give you the wi, just tfi, but the whole score is implied by the fact that you have the top 10 documents, I think. -Grant >>> andrew.boyd@mindspring.com 6/2/2005 3:21:35 PM >>> Ok. So if I get 10 Documents back from a search and I want to get the top 5 weighted terms for each of the 10 documents what API call should I use? I'm unable to find the connection between Similarity and a Document. I know I'm missing the elephant that must be in the middle of the room. Or maybe it's not there. Is what I'm trying to do do-able? Thanks, Andrew -----Original Message----- From: Max Pfingsthorn Sent: Jun 2, 2005 5:33 AM To: java-user@lucene.apache.org Subject: RE: calculate wi = tfi * IDFi for each document. Hi, DefaultSimilarity uses exactly this weighting scheme. Makes sense since it's a pretty standard relevance measure... Bye! max -----Original Message----- From: Andrew Boyd [mailto:andrew.boyd@mindspring.com] Sent: Thursday, June 02, 2005 11:39 To: java-user@lucene.apache.org Subject: calculate wi = tfi * IDFi for each document. If I have search results how can I calculate, using lucene's API, wi = tfi * IDFi for each document. wi = term weight tfi = term frequency in a document IDFi = inverse document frequency = log(D/dfi) dfi = document frequency or number of documents containing term i D = number of documents in my search result Thanks, Andrew --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org Andrew Boyd Software Architect Sun Certified J2EE Architect B&B Technical Services Inc. 205.422.2557 --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org