Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 85644 invoked from network); 3 Jun 2005 12:19:37 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 3 Jun 2005 12:19:37 -0000 Received: (qmail 33857 invoked by uid 500); 3 Jun 2005 12:19:30 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 33832 invoked by uid 500); 3 Jun 2005 12:19:30 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 33796 invoked by uid 99); 3 Jun 2005 12:19:29 -0000 X-ASF-Spam-Status: No, hits=0.4 required=10.0 tests=DNS_FROM_RFC_ABUSE,SPF_HELO_PASS X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: local policy) Received: from pop06.mail.atl.earthlink.net (HELO pop06.mail.atl.earthlink.net) (207.69.200.40) by apache.org (qpsmtpd/0.28) with ESMTP; Fri, 03 Jun 2005 05:19:28 -0700 Received: from wamui-cedar.atl.sa.earthlink.net ([209.86.224.29]) by pop06.mail.atl.earthlink.net with esmtp (Exim 3.36 #10) id 1DeB8v-0004mV-00 for java-user@lucene.apache.org; Fri, 03 Jun 2005 08:19:17 -0400 Message-ID: <8447007.1117801157230.JavaMail.root@wamui-cedar.atl.sa.earthlink.net> Date: Fri, 3 Jun 2005 07:19:16 -0500 (GMT-05:00) From: Andrew Boyd Reply-To: Andrew Boyd To: java-user@lucene.apache.org Subject: RE: calculate wi = tfi * IDFi for each document. Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Mailer: Earthlink Zoo Mail 1.0 X-Virus-Checked: Checked X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Thanks for the reply. It looks like I can use parts of Similarity. I'll post back once I get it working or at least closer ;-) Andrew -----Original Message----- From: Grant Ingersoll Sent: Jun 3, 2005 6:51 AM To: java-user@lucene.apache.org Subject: RE: calculate wi = tfi * IDFi for each document. I think the TermFreqVector (reader.getTermVector) has the info you want per document. You will need to sort it by frequency to get the top terms in each document. It doesn't give you the wi, just tfi, but the whole score is implied by the fact that you have the top 10 documents, I think. -Grant >>> andrew.boyd@mindspring.com 6/2/2005 3:21:35 PM >>> Ok. So if I get 10 Documents back from a search and I want to get the top 5 weighted terms for each of the 10 documents what API call should I use? I'm unable to find the connection between Similarity and a Document. I know I'm missing the elephant that must be in the middle of the room. Or maybe it's not there. Is what I'm trying to do do-able? Thanks, Andrew -----Original Message----- From: Max Pfingsthorn Sent: Jun 2, 2005 5:33 AM To: java-user@lucene.apache.org Subject: RE: calculate wi = tfi * IDFi for each document. Hi, DefaultSimilarity uses exactly this weighting scheme. Makes sense since it's a pretty standard relevance measure... Bye! max -----Original Message----- From: Andrew Boyd [mailto:andrew.boyd@mindspring.com] Sent: Thursday, June 02, 2005 11:39 To: java-user@lucene.apache.org Subject: calculate wi = tfi * IDFi for each document. If I have search results how can I calculate, using lucene's API, wi = tfi * IDFi for each document. wi = term weight tfi = term frequency in a document IDFi = inverse document frequency = log(D/dfi) dfi = document frequency or number of documents containing term i D = number of documents in my search result Thanks, Andrew --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org Andrew Boyd Software Architect Sun Certified J2EE Architect B&B Technical Services Inc. 205.422.2557 --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org