Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 76265 invoked from network); 4 Mar 2010 19:49:42 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 4 Mar 2010 19:49:42 -0000 Received: (qmail 41420 invoked by uid 500); 4 Mar 2010 19:49:29 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 41393 invoked by uid 500); 4 Mar 2010 19:49:29 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 41385 invoked by uid 99); 4 Mar 2010 19:49:28 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Mar 2010 19:49:28 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of lists@nabble.com designates 216.139.236.158 as permitted sender) Received: from [216.139.236.158] (HELO kuber.nabble.com) (216.139.236.158) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Mar 2010 19:49:21 +0000 Received: from isper.nabble.com ([192.168.236.156]) by kuber.nabble.com with esmtp (Exim 4.63) (envelope-from ) id 1NnH2a-0007dP-Te for java-user@lucene.apache.org; Thu, 04 Mar 2010 11:49:00 -0800 Message-ID: <27785693.post@talk.nabble.com> Date: Thu, 4 Mar 2010 11:49:00 -0800 (PST) From: PlusPlus To: java-user@lucene.apache.org Subject: Re: Why is frequency a float number In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Nabble-From: r.shahidinejad@gmail.com References: <27714523.post@talk.nabble.com> Thanks for the reply. Actually what I'm looking for is to have a kind of fuzzy memberships for the terms of a document. That is, for each term of a document, I will have a membership value for that term and each term will be in each document, at most once. For that, I will need float TF and IDF values. It seems that Lucene does not support what I need and I should change Lucene's code which is not an easy task. Do you have any suggestions for me? Best, Reza hossman wrote: > > > : I was wondering why TF method gets a float parameter. Isn't frequency > : always considered to be integer? > : > : public abstract float tf(float freq) > > Take a look at how PhraseQuery and SPanNearQuery use tf(float). > > For simple terms (and TermQuery) tf is always an integer, but when dealing > with phrases the concept of a "sloppy match" (ie: a phrase with a gap in > the middle) results in a fractional "frequency" value because it is not as > good as an "exact" match on the phrase (which does result in an integer tf > value) > > > > > -Hoss > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > > -- View this message in context: http://old.nabble.com/Why-is-frequency-a-float-number-tp27714523p27785693.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org