Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 52261 invoked from network); 4 Oct 2006 18:24:50 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 4 Oct 2006 18:24:50 -0000 Received: (qmail 8015 invoked by uid 500); 4 Oct 2006 18:24:40 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 7940 invoked by uid 500); 4 Oct 2006 18:24:40 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 7894 invoked by uid 99); 4 Oct 2006 18:24:40 -0000 Received: from idunn.apache.osuosl.org (HELO idunn.apache.osuosl.org) (140.211.166.84) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Oct 2006 11:24:40 -0700 X-ASF-Spam-Status: No, hits=0.0 required=5.0 tests= Received: from [169.229.70.167] ([169.229.70.167:46998] helo=rescomp.berkeley.edu) by idunn.apache.osuosl.org (ecelerity 2.1.1.8 r(12930)) with ESMTP id 31/70-20288-46CF3254 for ; Wed, 04 Oct 2006 11:24:37 -0700 Received: by rescomp.berkeley.edu (Postfix, from userid 1007) id 0A14A5B776; Wed, 4 Oct 2006 11:24:33 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by rescomp.berkeley.edu (Postfix) with ESMTP id 009C37F403 for ; Wed, 4 Oct 2006 11:24:33 -0700 (PDT) Date: Wed, 4 Oct 2006 11:24:33 -0700 (PDT) From: Chris Hostetter To: java-user@lucene.apache.org Subject: Re: Number Proximity Query In-Reply-To: <85c493340610040115i3f198e87g31f8f2173b0ebab7@mail.gmail.com> Message-ID: References: <85c493340610031751x5ed170ebxd38f1c8475a81984@mail.gmail.com> <85c493340610040115i3f198e87g31f8f2173b0ebab7@mail.gmail.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N : (1) Should values returned by DocValues (return from ValueSource) must : always betwen 1.0 and 0.0 ? How is this value affect the overall document : scores, assuming there are others Query clauses as well that is perform on : the document (on other fields). The "values" returned by the various methods in DocValues can be anything you want, as long as they conform to hte primitive datatype in the method sig -- ideally whatever float floatVal returns should be roughly equivilent to whever double you return from doubleVal so that your ValueSource appears to behave the same way regardless of whatever other ValueSources you may compose it in. How the values affect the over all Document score really depends on how big the values you return are, and how you compose your FunctionQuery with the other parts of your query (presumably in a BooleanQuery) Take a look at LinearFloatFunction and the DocValues it produces. it's a good example of what a "function" you want to be able to compose in a function query should do. If i remember your problem statement correctly, all you really need is an AbsoluteValueFunction that you could compose with some linear functions ala: linear(abs(linear(field(x), -1, N), -1, M))) ...where N is the magic number you want your doc vals to be close to, M is the biggest value you ever want your FunctionQuery to score a document with, and field(x) is where you put a FieldCacheSource on the field you care about. Your AbsoluteValueFunction would look a lot like LinearFloatFunction, with out the slope or intercept and a getValues method that looked something like... public DocValues getValues(IndexReader reader) throws IOException { final DocValues vals = source.getValues(reader); return new DocValues() { public float floatVal(int doc) { return Math.abs(vals.floatVal(doc)) } ... : (2) The documentation on the following functions is extremely lacking (no : matter where I looked). Any expert here can help out ? : : -- Weight.getValue() : what values should be returned for : NumberProximityQuery? : -- Weight.sumOfSquareWeights() : no idea what is this for??? : -- Weight.normalize() : still no idea : -- Scorer.score() : should this value always between 1.0 and 0.0 ? I honestly don't remember off the top of me head what those methods are for our how they come into play -- the scoring.html doc that's in progress should help clear up some of that. As i recall, the last time i needed to understand what those methods did, i looked at some of the primitive query types (like TermQuery and BooleanQuery) to see what they did. The one thing I can tell you with certainty is that there is nothing magical about scores between 0.0 and 1.0 -- the notion that Lucene Scores are between 0 and 1 is a myth perpetuated by the Hits interface which does a mock-normalization of the scores if the highest is greater then 1. When you get down into the bowels of scoring, the scores can be any float -- even negative numbers are legal scores. -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org