Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 44763 invoked from network); 16 Nov 2005 20:33:22 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 16 Nov 2005 20:33:22 -0000 Received: (qmail 77666 invoked by uid 500); 16 Nov 2005 20:33:18 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 77489 invoked by uid 500); 16 Nov 2005 20:33:17 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 77339 invoked by uid 99); 16 Nov 2005 20:33:16 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Nov 2005 12:33:16 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (asf.osuosl.org: local policy) Received: from [209.10.110.95] (HELO londo.swishmail.com) (209.10.110.95) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Nov 2005 12:34:50 -0800 Received: (qmail 16853 invoked by uid 89); 16 Nov 2005 20:32:52 -0000 Received: from unknown (HELO ?192.168.168.81?) (71.138.157.43) by londo.swishmail.com with SMTP; 16 Nov 2005 20:32:52 -0000 Message-ID: <437B9773.20704@apache.org> Date: Wed, 16 Nov 2005 12:32:51 -0800 From: Doug Cutting User-Agent: Mozilla Thunderbird 1.0.7-1.1.fc3 (X11/20050929) X-Accept-Language: en-us, en MIME-Version: 1.0 To: java-dev@lucene.apache.org Subject: Re: svn commit: r332747 - in /lucene/java/trunk: ./ src/java/org/apache/lucene/search/regex/ src/test/org/apache/lucene/search/regex/ References: <20051112090339.7956.qmail@minotaur.apache.org> <437A3766.9010408@apache.org> <437A47ED.6010003@apache.org> <437B8281.10101@apache.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Yonik Seeley wrote: > Hmmm, very interesting idea. > Less than one decimal digit of precision might be hard to swallow when > you have to add scores together though: > > smallfloat(score1) + smallfloat(score2) + smallfloat(score3) > > Do you think that the 5/3 exponent/mantissa split is right for this, > or would a 4/4 be better? The float epsilon should ideally be greater than the minimum score increment, and the float range should ideally be at least 100x greater than the maximum score increment, to permit boosting, large queries, etc. Given a 100M document collection, the maximum idf is log(100M) = ~18, with a length-normalized tf of 1, for a max of 18. So the float range should ideally be around 1800 or greater. The minimum idf is 1, and the minimum normalized tf with 10k word documents is 1/100. So the float epsilon should ideally be less than 1/100. 5 bits of mantissa and 3 bits of exponent is closest to this, but not quite there, with an epsilon of 1/32 and a range of up to ~1000. Did I get the math right? Doug --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org